基于代表点与K近邻的密度峰值聚类算法被引量：8

Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors

作　　者：张清华[1,2] 周靖鹏代永杨王国胤[1,2] ZHANG Qing-Hua;ZHOU Jing-Peng;DAI Yong-Yang;WANG Guo-Yin(Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism(Chongqing University of Posts and Telecommunications),Chongqing 400065,China;Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)

机构地区：[1]旅游多源数据感知与决策技术文化和旅游部重点实验室(重庆邮电大学),重庆400065 [2]计算智能重庆市重点实验室(重庆邮电大学),重庆400065

出　　处：《软件学报》2023年第12期5629-5648,共20页Journal of Software

基　　金：国家重点研发计划(2020YFC2003502);国家自然科学基金(61876201);重庆市自然科学基金(cstc2019jcyj-cxttX0002,cstc2021ycjh-bgzxm0013);重庆市教委重点合作项目(HZ2021008)。

摘　　要：密度峰值聚类(density peaks clustering,DPC)是一种基于密度的聚类算法,该算法可以直观地确定类簇数量,识别任意形状的类簇,并且自动检测、排除异常点.然而,DPC仍存在些许不足:一方面,DPC算法仅考虑全局分布,在类簇密度差距较大的数据集聚类效果较差;另一方面,DPC中点的分配策略容易导致“多米诺效应”.为此,基于代表点(representative points)与K近邻(K-nearest neighbors,KNN)提出了RKNN-DPC算法.首先,构造了K近邻密度,再引入代表点刻画样本的全局分布,提出了新的局部密度;然后,利用样本的K近邻信息,提出一种加权的K近邻分配策略以缓解“多米诺效应”;最后,在人工数据集和真实数据集上与5种聚类算法进行了对比实验,实验结果表明,所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果.Density peaks clustering(DPC)is a density-based clustering algorithm that can intuitively determine the number of clusters,identify clusters of any shape,and automatically detect and exclude abnormal points.However,DPC still has some shortcomings:The DPC algorithm only considers the global distribution,and the clustering performance is poor for datasets with large cluster density differences.In addition,the point allocation strategy of DPC is likely to cause a Domino effect.Hence,this study proposes a DPC algorithm based on representative points and K-nearest neighbors(KNN),namely,RKNN-DPC.First,the KNN density is constructed,and the representative points are introduced to describe the global distribution of samples and propose a new local density.Then,the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect.Finally,a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets.The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.

关键词：聚类分析密度峰值聚类代表点 K近邻(KNN)

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于代表点与K近邻的密度峰值聚类算法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于代表点与K近邻的密度峰值聚类算法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于代表点与K近邻的密度峰值聚类算法被引量：8