基于密度峰值的快速聚类算法优化  被引量:8

Optimization of fast clustering algorithm based on density peaks

在线阅读下载全文

作  者:戴娇[1,2] 张明新[2] 郑金龙[2] 蒋礼青 尚赵伟[3] DAI Jiao ZHANG Ming-xin ZHENG Jin-long JIANG Li-qing SHANG Zhao-wei(School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China College of Computer Science, Chongqing University, Chongqing 400030, China)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]常熟理工学院计算机科学与工程学院,江苏常熟215500 [3]重庆大学计算机学院,重庆400030

出  处:《计算机工程与设计》2016年第11期2979-2984,共6页Computer Engineering and Design

基  金:国家自然科学基金项目(61173130)

摘  要:CFSFDP指定全局密度阈值dc时未考虑数据空间分布特性,导致聚类质量下降,且无法对多密度峰值的数据集准确聚类。针对以上缺点,提出一种基于投影分区及类合并技术优化CFSFDP(简称PM-CFSFDP)的聚类算法。利用投影分析方法将数据集进行分区,对各分区进行局部聚类,避免使用全局dc;引入内聚程度衡量参数指导子类合并,实现对数据密度与类间距分布不均匀及多密度峰值的数据集的准确聚类。基于4个典型数据集的仿真结果表明,PM-CFSFDP算法比CFSFDP和AGD-DBSCAN具有更加精确的聚类效果。The global density threshold dc which is specified without the consideration of spatial distribution of the data will lead to the decrease of clustering quality.Moreover,the data sets with multi-density peaks cannot be clustered accurately.To resolve the above shortcomings,an optimization of CFSFDP algorithm based on projection partition and class merging technique(PMCFSFDP)was proposed.To avoid the use of global dc,the data sets were divided into smaller partitions using the method of projection analysis and the local clustering was performed on them.The sub classes were merged under the guidance of the measure of cohesion.Data sets,which were unevenly distributed and had multi-density peaks,were correctly classified.Results of simulation based on 4typical data sets show that the PM-CFSFDP algorithm is more accurate than CFSFDP and AGD-DBSCAN.

关 键 词:聚类 密度阈值 密度峰值 投影分区 类合并 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象