检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵嘉[1] 陈磊 吴润秀[1] 张波 韩龙哲 ZHAO Jia;CHEN Lei;WU Run-xiu;ZHANG Bo;HAN Long-zhe(School of Information Engineering,Nanchang Institute of Technology,Nanchang Jiangxi 330099,China;Global Energy Internet Research Institute Co.,Ltd,Nanjing Jiangsu 210003,China)
机构地区:[1]南昌工程学院信息工程学院,江西南昌330099 [2]全球能源互联网研究院有限公司,江苏南京210003
出 处:《控制理论与应用》2022年第12期2349-2357,共9页Control Theory & Applications
基 金:国家自然科学基金项目(52069014,61962036);江西省杰出青年基金项目(2018ACB21029)资助。
摘 要:密度峰值聚类算法的局部密度定义未考虑密度分布不均数据类簇间的样本密度差异影响,易导致误选类簇中心;其分配策略依据欧氏距离通过密度峰值进行链式分配,而流形数据通常有较多样本距离其密度峰值较远,导致大量本应属于同一个类簇的样本被错误分配给其他类簇,致使聚类精度不高.鉴于此,本文提出了一种K近邻和加权相似性的密度峰值聚类算法.该算法基于样本的K近邻信息重新定义了样本局部密度,此定义方式可以调节样本局部密度的大小,能够准确找到密度峰值;采用样本的共享最近邻及自然最近邻信息定义样本间的相似性,摒弃了欧氏距离对分配策略的影响,避免了样本分配策略产生的错误连带效应.流形及密度分布不均数据集上的对比实验表明,本文算法能准确找到疏密程度相差较大数据集的密度峰值,避免了流形数据的分配错误连带效应,得到了满意的聚类效果;同时在真实数据集上的聚类效果也十分优秀.The local density definition of density peaks clustering algorithm does not take into account the influence of sample density difference between clusters with uneven density distribution data,which can easily lead to mistakenly select the cluster centers;the distribution strategy is chained according to the Euclidean distance through density peaks,and flow data usually has more samples farther away from their density peaks,resulting in a large number of samples that should belong to the same cluster being misallocated to other clusters,which result in poor clustering accuracy.In view of this,this paper proposes a density peaks clustering algorithm with K-nearest neighbors and weighted similarities,the local density of the sample based on the K-nearest neighbors information of the sample is redefined,which can adjust the local density of the sample and accurately find the density peaks.The shared nearest neighbors and natural nearest neighbors information of the samples are used to define the similarity between the samples,which eliminates the influence of Euclidean distance on the allocation strategy and avoids the false cascading effect of the sample allocation strategy.The comparative experiments on the uneven density distribution datasets and flow datasets show that the algorithm can accurately find the density peaks of the datasets with large difference of density,avoid the misallocation effect of flow data,and get satisfactory clustering effect.The clustering results on the real datasets is also excellent.
关 键 词:密度峰值聚类 局部密度 K近邻 共享最近邻 自然最近邻
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.49.72