基于关联规则的文本聚类算法的研究被引量：5

Research on text clustering algorithm based on association rule

出　　处：《计算机应用研究》2008年第4期986-988,共3页Application Research of Computers

基　　金：国家自然科学基金资助项目(60573065);国家"863"计划资助项目(2002AA4Z3240);教育部的世行贷款--21世纪初高等教育教学改革资助项目(1283B0843)

摘　　要：K-均值聚类算法是目前一种较好的文本分类算法,算法中的相似度计算通常基于词频统计,小文档或简单句子由于词频过小,使用该算法聚类效果较差。为此,提出了一种基于词语关联度的相似度计算算法,对简单文档集执行关联规则算法,得出基于关键词的关联规则,并根据这些规则求得词语关联度矩阵,然后由权重对文本进行文本特征向量表示,最后借助于关联度矩阵和文本特征向量,并按一定算法计算出句子相似度。实验证明该算法可得到较好的聚类结果,且其不仅利用词频统计的方法而且考虑了词语间的关系。K-means clustering algorithm is a kind of better text categorization algorithm. Its similarity calculation is based on the word frequency statistics. Because the word frequency of short or simple document is low, result of the K-means clustering method is not desirable. To solve above mentioned problems, put forward a kind of K-means text clustering method based on association value of words. Firstly, conducted the association rule algorithm on the short document sets to get the association rules about key words. Got the matrix about words＇ association by using the key words association rule. Secondly, expressed text eigenvector by weight of words in the document. Finally, according to the matrix about words＇ association and text eigenvector expressing, got the similarity value of documents by certain algorithm. Experiment shows that it can get the efficient clustering results. Not only applies the frequency of words in this method, but also consider the association of words.

关键词：文本挖掘 K-均值聚类关联规则权重

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于关联规则的文本聚类算法的研究被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于关联规则的文本聚类算法的研究 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于关联规则的文本聚类算法的研究被引量：5