融合局部截断距离及小簇合并的密度峰值聚类  

Density peak clustering combining local truncation distance and small clusters merging

在线阅读下载全文

作  者:陈素根 赵志忠 CHEN Sugen;ZHAO Zhizhong(School of Mathematics and Physics,Anqing Normal University,Anqing 246133,Anhui,China;Key Laboratory of Modeling,Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes,Anqing 246133,Anhui,China)

机构地区:[1]安庆师范大学数理学院,安徽安庆246133 [2]安徽省大别山区域复杂生态系统建模、仿真与控制重点实验室,安徽安庆246133

出  处:《山东大学学报(工学版)》2025年第2期58-70,共13页Journal of Shandong University(Engineering Science)

基  金:国家自然科学基金青年基金资助项目(61702012);安徽省自然科学基金面上资助项目(2008085MF193);安徽省高等学校科学研究重点资助项目(2024AH051095)。

摘  要:针对密度峰值聚类算法定义的截断距离仅考虑样本全局分布,在样本分配时容易产生“多米诺骨牌”现象等问题,提出一种融合局部截断距离及小簇合并的密度峰值聚类算法。基于样本局部分布信息计算每个样本截断距离和局部密度,有利于准确获得复杂结构数据集上密度峰;根据样本决策值之间差值关系选择潜在密度峰并形成多个小簇;定义一种新的小簇间相似度,根据此相似度将小簇合并获得聚类结果,有效避免了“多米诺骨牌”现象。采用6个人工数据集和8个UCI数据集进行验证,所提算法在上述14个数据集上的标准化互信息、调整兰德系数和调整互信息平均值比5个对比算法平均提高18.15%、28.99%和20.22%,比原始密度峰值聚类算法提高30.06%,47.15%和31.90%,具有较好的聚类效果。Aiming at the problems that the truncation distance defined by the density peak clustering algorithm only considered the global distribution of samples and the"domino"phenomenon was easy to occur when assigning samples,a novel density peak clustering algorithm combining local truncation distance and small clusters merging was proposed.The truncation distance and local density of each sample were calculated based on the local distribution information of samples,which were conducive to accurately obtaining the density peaks on complex structure datasets.Potential density peaks were selected based on the difference between samples decision values and multiple small clusters were formed.A new kind of similarity between clusters was defined,and clusters were merged to obtain clustering results according to this similarity,which effectively avoided the"domino"phenomenon.Compared with several clustering algorithms on six synthetic datasets and eight UCI datasets,the standardized mutual information,adjusted rand index and adjusted mutual information average values of the proposed algorithm on 14 datasets were 18.15%,28.99%and 20.22%higher than the five comparison algorithms on average,especially 30.06%,47.15%and 31.90%higher than original density peak clustering algorithm.Experimental results showed the proposed algorithm had a good clustering effect.

关 键 词:聚类 密度峰值聚类 截断距离 局部密度 潜在密度峰 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象