一种改进的k均值文本聚类算法  被引量:5

An improved k-means algorithm for text clustering

在线阅读下载全文

作  者:张银明[1] 黄廷磊[1] 林科[1] 张嫱嫱 

机构地区:[1]桂林电子科技大学计算机与信息安全学院,广西桂林541004

出  处:《桂林电子科技大学学报》2016年第4期311-314,共4页Journal of Guilin University of Electronic Technology

基  金:国家863计划(2012AA011005)

摘  要:针对k均值算法在文本聚类中由于初始聚类质心随机选择,使得聚类结果陷入局部最优,且孤立点和不确定的聚类个数造成k均值算法准确性低、收敛速度慢的问题,提出了一种改进的k均值文本聚类算法。该算法采用fp-growth算法挖掘文本频繁项集,过滤频繁项集得到核心频繁项集,并利用核心频繁项集指导文本初始聚类质心和聚类个数的生成,最后k均值算法利用初始聚类质心和聚类个数完成文本聚类。在新浪微博数据集上进行文本聚类实验,实验结果表明,改进的k均值算法提高了文本聚类的准确性,加快了收敛速度,具有较强的鲁棒性。Random selection of initial cluster centroid in k-means algorithm for text clustering resulted in local optimization of clustering results,and isolated points and indeterminate cluster number led to low accuracy and slow convergence speed of kmeans algorithm.So an improved k-means algorithm for text clustering was proposed.In the proposed algorithm,fpgrowth algorithm was used for mining frequent item sets of text,and frequent item sets of text were filtered to obtain the core frequent item sets,and then the core frequent item sets were adopted to generate initial cluster centroid and the number of clustering.Finally k-means algorithm was applied for text clustering with the generated initial cluster centroid and the number of clustering.The results of text clustering experiment on Sina microblog dataset show that the improved k-means algorithm can effectively improve the accuracy of text clustering and accelerate the convergence speed,and has strong robustness.

关 键 词:文本聚类 FP-GROWTH K均值 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象