基于共享最近邻的自适应密度峰值聚类算法  被引量:1

Adaptive Density Peak Clustering Algorithm Based on Shared Nearest Neighbor

在线阅读下载全文

作  者:王心耕 杜韬[1,2] 周劲 陈迪[1] 仵匀政 WANG Xingeng;DU Tao;ZHOU Jin;CHEN Di;WU Yunzheng(College of Information Science and Engineering,University of Jinan,Jinan 250024,China;Shandong Provincial Key Laboratory of Network Based Intelligent Computing,Jinan 250024,China)

机构地区:[1]济南大学信息科学与工程学院,济南250024 [2]山东省网络环境智能计算技术重点实验室,济南250024

出  处:《计算机科学》2024年第8期97-105,共9页Computer Science

基  金:国家自然科学基金(62273164);山东省自然科学基金联合基金(ZR2020LZH009)。

摘  要:密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,该算法虽能自动发现簇中心,实现任意形状数据的高效聚类,但依然存在一些缺陷。针对密度峰值聚类算法在定义相关度量值时未考虑数据的位置信息、聚类中心数目需要人工预先设定且分配样本点时易出现连锁反应这3个缺陷,提出一种基于共享最近邻的自适应密度峰值聚类算法。首先,利用共享最近邻重新定义局部密度等度量值,充分考虑了数据分布的局部特点,使样本点的空间分布特征得以更好地体现;其次,通过引入密度衰减现象让样本点自动聚集成微簇,实现了簇个数自适应确定和簇中心自适应选取;最后,提出一种两阶段的分配方法,先将微簇合并形成簇的主干部分,再用上一步分配好的簇主干指导剩余点的分配,避免了链式反应的发生。在二维合成数据集以及UCI数据集上的实现表明,相较于经典的密度峰值聚类算法及近年来对其提出的改进算法,在大多数情况下,所提算法表现出更优异的性能。Density peak clustering algorithm(DPC)is a simple and efficient unsupervised clustering algorithm.Although the algorithm can automatically discover cluster centers and realize efficient clustering of arbitrary shape data,it still has some defects.Aiming at the three defects of density peak clustering algorithm,which does not consider the location information of data when defining the correlation value,the number of clustering centers needs to be set manually in advance,and the chain reaction is easy to occur when distributing sample points,an adaptive density peak clustering algorithm based on shared nearest neighbor is proposed.Firstly,the shared nearest neighbor is used to redefine the local density and other measures,and the local characteristics of data distribution are fully considered,so that the spatial distribution characteristics of sample points can be better reflected.Se-condly,by introducing the phenomenon of density attenuation,the sample points are automatically gathered into micro-clusters,which realizes the adaptive determination of cluster number and the adaptive selection of cluster center.Finally,a two-stage distribution method is proposed,in which the micro-clusters are merged to form the backbone of the cluster,and then the backbone of the cluster allocated in the previous step guides the distribution of the remaining points,avoiding the occurrence of chain reactions.The implementation on two dimensional composite datasets and UCI datasets shows that this algorithm has better perfor-mance in most cases than the classical density peak clustering algorithm and its improved algorithms in recent years.

关 键 词:共享最近邻 密度峰值聚类 分配策略 聚类中心 密度衰减 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象