面向密度峰值聚类的高效相似度度量  被引量:1

Efficient similarity measure for density peaks clustering

在线阅读下载全文

作  者:王丽娟[1,2] 徐晓[1] 丁世飞 WANG Lijuan;XU Xiao;DING Shifei(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,Jiangsu,China;School of Information Engineering,Xuzhou College of Industrial Technology,Xuzhou 221114,Jiangsu,China)

机构地区:[1]中国矿业大学计算机科学技术学院,江苏徐州221116 [2]徐州工业职业技术学院信息工程学院,江苏徐州221114

出  处:《山东大学学报(工学版)》2024年第3期12-21,29,共11页Journal of Shandong University(Engineering Science)

基  金:国家自然科学基金资助项目(62206296);中央高校基本科研业务费专项资金资助项目(2022QN1095);江苏省高等职业院校专业带头人高端研修资助项目(2022GRFX063)。

摘  要:针对密度峰值聚类(density peaks clustering,DPC)计算复杂度高的问题,提出一种面向密度峰值聚类的高效相似度度量(efficient similarity measure,ESM)法,通过仅度量最近邻之间的相似度构建不完全相似度矩阵。最近邻的选择基于一个随机第三方数据对象,无需另外引入参数。基于ESM法构建相似度矩阵,提出一种改进的高效密度峰值聚类(efficient density peaks clustering,EDPC)算法,在保持准确率的同时提高DPC识别聚类中心的效率。理论分析和试验结果表明,ESM法通过减少一定不相似的相似度,可以有效提高DPC及其改进算法基于K最近邻的密度峰值聚类(density peaks clustering based on K-nearest neighbors,DPC-KNN)和模糊加权K最近邻密度峰值聚类(fuzzy weighted K-nearest neighbors density peaks clustering,FKNN-DPC)的计算效率,具有较强的可扩展性。An efficient similarity measure(ESM)method was proposed for density peaks clustering(DPC)to address the issue of high computational complexity.The ESM method constructed an incomplete similarity matrix by only measuring the similarity between nearest neighbors,without the need for additional parameters,based on a randomly selected third-party data object.Based on the similarity matrix constructed by ESM,an improved efficient density peaks clustering(EDPC)algorithm was proposed to improve the efficiency of DPC to identify cluster centers while maintaining accuracy.Theoretical analysis and experimental results proved that the proposed ESM could effectively improve the computational efficiency of DPC and its improved algorithms density peaks clustering based on K-nearest neighbors(DPC-KNN)and fuzzy weighted K-nearest neighbors density peaks clustering(FKNN-DPC)by reducing certain dissimilar similarity measures.ESM had robust scalability.

关 键 词:密度峰值聚类 聚类中心 相似度矩阵 计算复杂度 大规模数据集 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象