检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张春昊 解滨[1,2,3] 徐童童 张喜梅[1] ZHANG Chunhao;XIE Bin;XU Tongtong;ZHANG Ximei(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China;Hebei Provincial Key Laboratory of Network&Information Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China)
机构地区:[1]河北师范大学计算机与网络空间安全学院,河北石家庄050024 [2]供应链大数据分析与数据安全河北省工程研究中心(河北师范大学),河北石家庄050024 [3]河北省网络与信息安全重点实验室(河北师范大学),河北石家庄050024
出 处:《山东大学学报(理学版)》2025年第1期29-44,共16页Journal of Shandong University(Natural Science)
基 金:国家自然科学基金资助项目(62076088);中央科研院所基本科研业务费资助项目(SK202324);河北师范大学技术创新基金资助项目(L2020K09)。
摘 要:结合自然邻居搜索算法改进了密度峰值聚类(clustering by fast search and find of density peaks,CFSFDP)算法存在的一系列问题,提出基于自然邻居搜索优化策略的密度峰值聚类(density peak clustering algorithm optimized by natural neighbor search,NaN-CFSFDP)算法。基于自然邻居搜索算法提出了一种离群样本的检测方法,针对CFSFDP算法中截断距离d_(c)人工准确取值较难的问题,结合自然邻居搜索算法改进了d_(c)的计算方式,实现了d_(c)的自动取值。重新设计并统一了CFSFDP算法的样本密度度量规则,使其更关注每个样本的局部信息。由于数据集中因类簇间的密度差异大,密度峰值点集中于稠密簇使得簇丢失,因此提出样本共享自然邻居和类簇共享自然邻居的概念,构造新的类簇融合算法。合成数据集和真实数据集上的实验结果表明,在大多数情况下,NaN-CFSFDP算法在聚类性能上优于或至少与比较方法相当,且与CFSFDP算法及其改进算法相比参数更少。We combine the natural neighbor search algorithm to improve a series of problems of the density peaks clustering(CFSFDP)algorithm,and propose the NaN-CFSFDP algorithm.First,an outlier samples detection method is proposed based on the natural neighbor search algorithm.Then,for the problem that the truncation distance d_(c) is difficult to be taken accurately manually in the CFSFDP algorithm,the calculation of d_(c) is improved in combination with the natural neighbor search algorithm,and the automatic taking of d_(c) is realized.The metric rule of the sample density of the CFSFDP algorithm is redesigned and unified to make it pay more attention to the local information of each sample.Finally,to address the problem that the density peak points in the dataset may be concentrated in dense clusters due to the large density difference between clusters,which leads to cluster loss,the concepts of shared natural neighbors for samples and shared natural neighbors for clusters are proposed to construct a new cluster fusion algorithm.Experimental results on synthetic and real datasets show that the algorithm outperforms or is at least comparable to the comparative method in terms of clustering performance in most cases and has fewer parameters compared to CFSFDP algorithm and its improvements.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222