基于代表点与K近邻的密度峰值聚类算法  被引量:8

Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors

在线阅读下载全文

作  者:张清华[1,2] 周靖鹏 代永杨 王国胤[1,2] ZHANG Qing-Hua;ZHOU Jing-Peng;DAI Yong-Yang;WANG Guo-Yin(Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism(Chongqing University of Posts and Telecommunications),Chongqing 400065,China;Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)

机构地区:[1]旅游多源数据感知与决策技术文化和旅游部重点实验室(重庆邮电大学),重庆400065 [2]计算智能重庆市重点实验室(重庆邮电大学),重庆400065

出  处:《软件学报》2023年第12期5629-5648,共20页Journal of Software

基  金:国家重点研发计划(2020YFC2003502);国家自然科学基金(61876201);重庆市自然科学基金(cstc2019jcyj-cxttX0002,cstc2021ycjh-bgzxm0013);重庆市教委重点合作项目(HZ2021008)。

摘  要:密度峰值聚类(density peaks clustering,DPC)是一种基于密度的聚类算法,该算法可以直观地确定类簇数量,识别任意形状的类簇,并且自动检测、排除异常点.然而,DPC仍存在些许不足:一方面,DPC算法仅考虑全局分布,在类簇密度差距较大的数据集聚类效果较差;另一方面,DPC中点的分配策略容易导致“多米诺效应”.为此,基于代表点(representative points)与K近邻(K-nearest neighbors,KNN)提出了RKNN-DPC算法.首先,构造了K近邻密度,再引入代表点刻画样本的全局分布,提出了新的局部密度;然后,利用样本的K近邻信息,提出一种加权的K近邻分配策略以缓解“多米诺效应”;最后,在人工数据集和真实数据集上与5种聚类算法进行了对比实验,实验结果表明,所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果.Density peaks clustering(DPC)is a density-based clustering algorithm that can intuitively determine the number of clusters,identify clusters of any shape,and automatically detect and exclude abnormal points.However,DPC still has some shortcomings:The DPC algorithm only considers the global distribution,and the clustering performance is poor for datasets with large cluster density differences.In addition,the point allocation strategy of DPC is likely to cause a Domino effect.Hence,this study proposes a DPC algorithm based on representative points and K-nearest neighbors(KNN),namely,RKNN-DPC.First,the KNN density is constructed,and the representative points are introduced to describe the global distribution of samples and propose a new local density.Then,the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect.Finally,a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets.The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.

关 键 词:聚类分析 密度峰值聚类 代表点 K近邻(KNN) 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象