基于子空间变量自动加权的K-均值文本聚类算法的研究  被引量:1

STUDY ON K-MEANS TEXT CLUSTERING ALGORITHM BASED ON SUBSPACE VARIABLE SELF-WEIGHTING

在线阅读下载全文

作  者:宁涛[1] 晋博晨[1] 宋存利[1] 

机构地区:[1]大连交通大学软件学院,辽宁大连116052

出  处:《计算机应用与软件》2008年第8期251-253,共3页Computer Applications and Software

摘  要:传统的K-均值算法聚类虽然速度快,在文本聚类中易于实现,但其同量地依赖于所有变量,聚类效果往往不尽如人意。为了克服这一缺点,提出一种改进的K-均值文本聚类算法,它在K-均值聚类过程中,向每一个聚类簇中的关键词自动计算添加一个权重,重要的关键词赋予较大的权重。经过实验测试,获得了一种基于子空间变量自动加权的适合文本数据聚类分析的改进算法,它不仅可以在大规模、高维和稀疏的文本数据上有效地进行聚类,还能够生成质量较高的聚类结果。实验结果表明基于子空间变量自动加权的K-均值文本聚类算法是有效的大规模文本数据聚类算法。K-means is one of the widely used text clustering techniques due to its rapidity, simplicity and high scalability. However, since traditional K-means algorithm treats all variables equally as well as the sparse of text characteristic matrix, it is not good enough in clustering effect. In this paper it proposes an improved K-means text clustering algorithm. In the process of K-means clustering, it can automatically ap- pend the weight value to key words in each cluster, but the important key words will be assigned the greater value. Through experiments and tests,the researchers obtained an optimized algorithm based on subspace variable self-weighting which suits the text data clustering analysis,it can cluster large-scale, high dimension and sparse text data effectively, and can form high quality clustering results. It was shown by the experimental result that this algorithm is effect for large-scale text data clustering.

关 键 词:文本聚类 K-均值 变量加权 子空间 

分 类 号:TP391.12[自动化与计算机技术—计算机应用技术] TS105.11[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象