基于加权密度Canopy的K-means文本聚类  被引量:2

K-means text clustering based on weighted density Canopy

在线阅读下载全文

作  者:宋健 李岩芳[1] 陈占芳[1] SONG Jian;LI Yanfang;CHEN Zhanfang(College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China)

机构地区:[1]长春理工大学计算机科学技术学院,长春130022

出  处:《中南民族大学学报(自然科学版)》2023年第5期636-642,共7页Journal of South-Central University for Nationalities:Natural Science Edition

基  金:吉林省科技计划项目(20200703003ZP)。

摘  要:针对现有文本聚类性能不高的问题,提出了一种改进质心初始化的K-means文本聚类算法.该算法首先利用Canopy算法进行文本预聚类,并且对Canopy算法的阈值选取策略进行改进,定义加权密度进行Canopy中心的选取,得到更准确的聚类数以及初始聚类中心;然后将所得结果作为K-means算法的初始化参数进行后续迭代聚类,有效解决了传统算法因随机选取初始聚类中心而陷入局部最优解的问题,减少了算法的迭代次数,提高了聚类准确性.实验结果表明:与其他同类型算法相比,该算法在文本聚类分析中更具优势.Aiming at the problem of low performance of existing text clustering,a K-means text clustering algorithm with improved centroid initialization is proposed.The algorithm first uses the Canopy algorithm for text pre-clustering,and improves the threshold selection strategy of the Canopy algorithm,defines the weighted density to select the Canopy center,and obtains more accurate cluster numbers and initial cluster centers.Then the results are used as the initialization parameters of the K-means algorithm for subsequent iterative clustering.The problem that the traditional algorithm falls into a local optimal solution due to the random selection of the initial clustering center is effectively solved,and the number of iterations of the algorithm is reduced,and the clustering accuracy is improved.The results of experiments show that,compared with other algorithms of the same type,this algorithm has more advantages in text clustering analysis.

关 键 词:文本聚类 K-MEANS算法 加权密度 Canopy算法 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象