基于密度峰值优化的K-means文本聚类算法  被引量:26

K-means text clustering algorithm based on density peaks

在线阅读下载全文

作  者:田诗宵 丁立新[1] 郑金秋[1] 

机构地区:[1]武汉大学计算机学院,湖北武汉430072

出  处:《计算机工程与设计》2017年第4期1019-1023,共5页Computer Engineering and Design

基  金:国家自然科学基金项目(60975050);中央高校基本科研业务费专项基金项目(2452015197;2452015194;2452015200)

摘  要:传统K-means算法中初始质心选定的随机性可能使算法陷入局部最优解,使聚类结果不够准确。改进初始质心的选择算法,为各样本点引入局部密度指标,根据其局部密度分布情况,选取处于密度峰值的点作为初始质心,得到稳定的离收敛质心很近的初始质心,减少算法迭代次数,提高运行效率,降低陷入局部最优的概率,显著提高聚类准确性。实验结果表明,与几种已有算法相比,该算法在文本聚类中有明显优势。In traditional K-means algorithm, the randomness of the original clustering center is likely to lead to locally optimal so- lution, causing low accuracy clustering result. To improve the selection algorithm of the original clustering center, local density index was introduced, and according to the distribution of the local density, the points at the peak were selected as original clus- tering center, based on which the stable initial clustering center was very close to the convergent center. In this way, the itera- tions of the algorithm were reduced, the operating efficiency was improved, the probability of being involved in the locally opti- mal solution was reduced and the accuracy of the cluster was improved significantly. Experimental results show that compared with the existing algorithms, the proposed method is of prominent advantage in text clustering.

关 键 词:文本聚类 密度峰值 F度量 K均值 向量化 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象