去中心化加权簇归并的密度峰值聚类算法  被引量:4

Density-Peak Clustering Algorithm on Decentralized andWeighted Clusters Merging

在线阅读下载全文

作  者:赵力衡 王建 陈虹君[1] ZHAO Liheng;WANG Jian;CHEN Hongjun(Department of Electronic Information Engineering,Chengdu Jincheng College,Chengdu 611731,China;School of Computer,Sichuan University,Chengdu 610041,China)

机构地区:[1]成都锦城学院电子信息学院,成都611731 [2]四川大学计算机学院,成都610041

出  处:《计算机科学与探索》2022年第8期1910-1922,共13页Journal of Frontiers of Computer Science and Technology

基  金:教育部协同育人项目(201902005069);四川省科技厅重点研发项目(22ZDYF0724)。

摘  要:快速搜索和寻找密度峰值聚类算法(DPC)是近年来提出的一种基于密度的聚类算法,具有原理简单、无需迭代并能实现任意形状聚类的优点。但该算法仍存在一些缺陷:围绕聚类中心点聚类,使聚类结果受中心点影响显著,且聚类中心点数量仍需人为指定;截断距离仅考虑了数据的分布密度,忽略了数据的内部特征;聚类过程中若有样本存在分配错误,会导致其后续样本聚类出现跟随错误。针对上述问题,尝试提出一种去中心化加权簇归并的密度峰值聚类算法(DCM-DPC)。该算法引入权重系数重新定义了局部密度,并由此划分出位于不同局部高密度区域的核心样本组,用于取代聚类中心点成为聚类的依据。最后将剩余样本按其近邻样本所在类簇的众数,或分配到最高耦合的核心样本组代表的类簇中或标注为离散点以完成聚类。在人工和UCI数据集上的实验结果表明,提出算法的聚类效果优于对比算法,对相互纠缠的类簇的边界样本划分也更加精确。The clustering by fast search and find of density peaks(DPC)is a density-based clustering algorithm proposed in recent years,which has the advantages of simple principle,no iteration and clustering of arbitrary shape.However,the algorithm still has some defects:clustering around clustering centers makes the clustering results significantly affected by central points,and the number of clustering centers needs to be manually specified;the cutoff distance considers the distribution density of the data but ignores the internal features;if there is a sample allocation error in the clustering process,the subsequent sample clustering may amplify the error.To solve the above problems,this paper proposes a density-peak clustering algorithm on decentralized and weighted clusters merging(DCM-DPC).This algorithm introduces the weight to redefine the local density,dividing core sample groups located in different local high density regions to replace cluster centers as the cluster basis.Finally,the remaining samples are assigned to the highest coupled core sample groups or labeled as discrete points by their near neighbor samples.Experiments on artificial and UCI datasets show that the clustering performance of the proposed algorithm outperforms the contrast algorithms,and the boundary samples partition of the entangled clusters is more accurate.

关 键 词:密度峰值 聚类 去中心点 邻域 簇归并 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象