快速搜索与发现密度峰值聚类算法的优化研究  被引量:34

Optimization of clustering by fast search and find of density peaks

在线阅读下载全文

作  者:蒋礼青 张明新[2] 郑金龙[2] 戴娇[1,2] 尚赵伟[3] 

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]常熟理工学院计算机科学与工程学院,江苏常熟215500 [3]重庆大学计算机科学与技术学院,重庆400030

出  处:《计算机应用研究》2016年第11期3251-3254,共4页Application Research of Computers

基  金:国家自然科学基金资助项目(61173130)

摘  要:CFSFDP是基于密度的新聚类算法,可聚类非球形数据集,具有聚类速度快、实现简单等优点。CFSFDP需人工尝试确定密度阈值dc,且对一个类中存在多密度峰值的数据无法进行准确聚类。为解决该缺点,提出基于近邻距离曲线和类合并优化CFSFDP(简称NM-CFSFDP)的聚类算法。算法用近邻距离曲线变化情况自动确定密度阈值dc,采用确定dc的CFSFDP对数据聚类,并利用计算dc值的方法指导类的合并,引入内聚程度衡量参数解决了类合并后不能撤销的难题,从而实现对多密度峰值数据的正确聚类。通过实验对比,NM-CFSFDP算法确实比CFSFDP算法具有更加精确的聚类效果。CFSFDP algorithm is a new clustering algorithm based on density, which cluster non-spherical data sets. CFSFDP has the advantages of fast clustering speed and simple realization. But the CFSFDP algorithm needs to perform multiple attempts to determine the density threshold dc and the existence of multiple density peaks of one class leads to incorrect cluste- ring. In view of the disadvantages, this paper proposed optimization of CFSFDP based on neighbor distance curve and merging clusters (for short NM-CFSFDP) algorithm. Firstly, the new algorithm gave the density threshold which named dc automatical- ly, the dc was determined by the change of the nearest neighbor distance curve. Secondly, NM-CFSFDP used CFSFDP algo- rithm, which gave dc automatically, to cluster the data set, and then merged the classes that could be merged, and the merging operation could be dynamically revoked in the algorithm. Through the contrast experiment, the NM-CFSFDP algorithm is more accurate than the CFSFDP in clustering.

关 键 词:聚类 密度峰值 近邻距离曲线 类合并 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象