K近邻和加权相似性的密度峰值聚类算法被引量：19

Density peaks clustering algorithm with K-nearest neighbors and weighted similarity

作　　者：赵嘉[1] 陈磊吴润秀[1] 张波韩龙哲 ZHAO Jia;CHEN Lei;WU Run-xiu;ZHANG Bo;HAN Long-zhe(School of Information Engineering,Nanchang Institute of Technology,Nanchang Jiangxi 330099,China;Global Energy Internet Research Institute Co.,Ltd,Nanjing Jiangsu 210003,China)

机构地区：[1]南昌工程学院信息工程学院,江西南昌330099 [2]全球能源互联网研究院有限公司,江苏南京210003

出　　处：《控制理论与应用》2022年第12期2349-2357,共9页Control Theory & Applications

基　　金：国家自然科学基金项目(52069014,61962036);江西省杰出青年基金项目(2018ACB21029)资助。

摘　　要：密度峰值聚类算法的局部密度定义未考虑密度分布不均数据类簇间的样本密度差异影响,易导致误选类簇中心;其分配策略依据欧氏距离通过密度峰值进行链式分配,而流形数据通常有较多样本距离其密度峰值较远,导致大量本应属于同一个类簇的样本被错误分配给其他类簇,致使聚类精度不高.鉴于此,本文提出了一种K近邻和加权相似性的密度峰值聚类算法.该算法基于样本的K近邻信息重新定义了样本局部密度,此定义方式可以调节样本局部密度的大小,能够准确找到密度峰值;采用样本的共享最近邻及自然最近邻信息定义样本间的相似性,摒弃了欧氏距离对分配策略的影响,避免了样本分配策略产生的错误连带效应.流形及密度分布不均数据集上的对比实验表明,本文算法能准确找到疏密程度相差较大数据集的密度峰值,避免了流形数据的分配错误连带效应,得到了满意的聚类效果;同时在真实数据集上的聚类效果也十分优秀.The local density definition of density peaks clustering algorithm does not take into account the influence of sample density difference between clusters with uneven density distribution data,which can easily lead to mistakenly select the cluster centers;the distribution strategy is chained according to the Euclidean distance through density peaks,and flow data usually has more samples farther away from their density peaks,resulting in a large number of samples that should belong to the same cluster being misallocated to other clusters,which result in poor clustering accuracy.In view of this,this paper proposes a density peaks clustering algorithm with K-nearest neighbors and weighted similarities,the local density of the sample based on the K-nearest neighbors information of the sample is redefined,which can adjust the local density of the sample and accurately find the density peaks.The shared nearest neighbors and natural nearest neighbors information of the samples are used to define the similarity between the samples,which eliminates the influence of Euclidean distance on the allocation strategy and avoids the false cascading effect of the sample allocation strategy.The comparative experiments on the uneven density distribution datasets and flow datasets show that the algorithm can accurately find the density peaks of the datasets with large difference of density,avoid the misallocation effect of flow data,and get satisfactory clustering effect.The clustering results on the real datasets is also excellent.

关键词：密度峰值聚类局部密度 K近邻共享最近邻自然最近邻

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

K近邻和加权相似性的密度峰值聚类算法被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

K近邻和加权相似性的密度峰值聚类算法 被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

K近邻和加权相似性的密度峰值聚类算法被引量：19