基于不相似性度量优化的密度峰值聚类算法被引量：30

Optimized Density Peaks Clustering Algorithm Based on Dissimilarity Measure

作　　者：丁世飞[1,2] 徐晓王艳茹 DING Shi-Fei;XU Xiao;WANG Yan-Ru(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,China;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]中国科学院计算技术研究所智能信息处理重点实验室,北京100190

出　　处：《软件学报》2020年第11期3321-3333,共13页Journal of Software

基　　金：国家自然科学基金(61672522,61379101);国家重点基础研究发展计划(973)(2013CB329502)。

摘　　要：密度峰值聚类(clustering by fast search and find of density peaks,简称DPC)是一种基于局部密度和相对距离属性快速寻找聚类中心的有效算法.DPC通过决策图寻找密度峰值作为聚类中心,不需要提前指定类簇数,并可以得到任意形状的簇聚类.但局部密度和相对距离的计算都只是简单依赖基于距离度量的相似度矩阵,所以在复杂数据上DPC聚类结果不尽如人意,特别是当数据分布不均匀、数据维度较高时.另外,DPC算法中局部密度的计算没有统一的度量,根据不同的数据集需要选择不同的度量方式.第三,截断距离dc的度量只考虑数据的全局分布,忽略了数据的局部信息,所以dc的改变会影响聚类的结果,尤其是在小样本数据集上.针对这些弊端,提出一种基于不相似性度量优化的密度峰值聚类算法(optimized density peaks clustering algorithm based on dissimilarity measure,简称DDPC),引入基于块的不相似性度量方法计算相似度矩阵,并基于新的相似度矩阵计算样本的K近邻信息,然后基于样本的K近邻信息重新定义局部密度的度量方法.经典数据集的实验结果表明,基于不相似性度量优化的密度峰值聚类算法优于DPC的优化算法FKNN-DPC和DPC-KNN,可以在密度不均匀以及维度较高的数据集上得到满意的结果;同时统一了局部密度的度量方式,避免了传统DPC算法中截断距离dc对聚类结果的影响.Clustering by fast search and find of density peaks(DPC)is an efficient algorithm for finding cluster centers quickly based on local-density and relative-distance.DPC uses the decision graph to find the density peaks as cluster centers.It does not need to specify the number of clusters in advance and clusters with arbitrary shapes can be obtained.However,the calculation of local-density and relative-distance depends on the similarity matrix which is based on distance metrics simply,thus,DPC is not satisfactory on complex datasets,especially when the datasets with uneven density and higher dimensions.In addition,the measurement of the local-density is not unified and different methods correspond to different datasets.Third,the measurement of dc only considers the global distribution of datasets,ignoring the local information of the data,so the change of dc will affect the results of clustering,especially on small scale datasets.Aiming at these shortcomings,this study proposes an optimized density peaks clustering algorithm based on dissimilarity measure(DDPC).DDPC introduces a mass-based dissimilarity measure to calculate the similarity matrix,and calculates the k-nearest neighbor information of the sample based on the new similarity matrix.Then local-density is redefined by the k-nearest neighbor information.Experimental results show that the optimized density peaks clustering algorithm based on dissimilarity measure is superior to the optimized FKNN-DPC and DPC-KNN clustering algorithms,and can be satisfied on datasets with uneven density and higher dimensions.As a result,the local-density measurement method is unified at the same time,which avoids the influence of dc on the clustering results in the traditional DPC algorithm.

关键词：密度峰值聚类局部密度决策图不相似性度量密度不均匀

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不相似性度量优化的密度峰值聚类算法被引量：30

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不相似性度量优化的密度峰值聚类算法 被引量：30

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于不相似性度量优化的密度峰值聚类算法被引量：30