基于自然邻居搜索优化策略的密度峰值聚类算法  

Density peak clustering algorithm optimized by natural neighbor search

在线阅读下载全文

作  者:张春昊 解滨[1,2,3] 徐童童 张喜梅[1] ZHANG Chunhao;XIE Bin;XU Tongtong;ZHANG Ximei(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China;Hebei Provincial Key Laboratory of Network&Information Security,Hebei Normal University,Shijiazhuang 050024,Hebei,China)

机构地区:[1]河北师范大学计算机与网络空间安全学院,河北石家庄050024 [2]供应链大数据分析与数据安全河北省工程研究中心(河北师范大学),河北石家庄050024 [3]河北省网络与信息安全重点实验室(河北师范大学),河北石家庄050024

出  处:《山东大学学报(理学版)》2025年第1期29-44,共16页Journal of Shandong University(Natural Science)

基  金:国家自然科学基金资助项目(62076088);中央科研院所基本科研业务费资助项目(SK202324);河北师范大学技术创新基金资助项目(L2020K09)。

摘  要:结合自然邻居搜索算法改进了密度峰值聚类(clustering by fast search and find of density peaks,CFSFDP)算法存在的一系列问题,提出基于自然邻居搜索优化策略的密度峰值聚类(density peak clustering algorithm optimized by natural neighbor search,NaN-CFSFDP)算法。基于自然邻居搜索算法提出了一种离群样本的检测方法,针对CFSFDP算法中截断距离d_(c)人工准确取值较难的问题,结合自然邻居搜索算法改进了d_(c)的计算方式,实现了d_(c)的自动取值。重新设计并统一了CFSFDP算法的样本密度度量规则,使其更关注每个样本的局部信息。由于数据集中因类簇间的密度差异大,密度峰值点集中于稠密簇使得簇丢失,因此提出样本共享自然邻居和类簇共享自然邻居的概念,构造新的类簇融合算法。合成数据集和真实数据集上的实验结果表明,在大多数情况下,NaN-CFSFDP算法在聚类性能上优于或至少与比较方法相当,且与CFSFDP算法及其改进算法相比参数更少。We combine the natural neighbor search algorithm to improve a series of problems of the density peaks clustering(CFSFDP)algorithm,and propose the NaN-CFSFDP algorithm.First,an outlier samples detection method is proposed based on the natural neighbor search algorithm.Then,for the problem that the truncation distance d_(c) is difficult to be taken accurately manually in the CFSFDP algorithm,the calculation of d_(c) is improved in combination with the natural neighbor search algorithm,and the automatic taking of d_(c) is realized.The metric rule of the sample density of the CFSFDP algorithm is redesigned and unified to make it pay more attention to the local information of each sample.Finally,to address the problem that the density peak points in the dataset may be concentrated in dense clusters due to the large density difference between clusters,which leads to cluster loss,the concepts of shared natural neighbors for samples and shared natural neighbors for clusters are proposed to construct a new cluster fusion algorithm.Experimental results on synthetic and real datasets show that the algorithm outperforms or is at least comparable to the comparative method in terms of clustering performance in most cases and has fewer parameters compared to CFSFDP algorithm and its improvements.

关 键 词:密度峰值 自然邻居 聚类 类簇融合 离群样本 截断距离 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象