基于特征空间的文本聚类被引量：8

Text Clustering Based on Feature Space

出　　处：《计算机技术与发展》2017年第9期75-77,81,共4页Computer Technology and Development

基　　金：安徽大学大学生科研训练计划项目(J18520148)

摘　　要：文本聚类是聚类算法的一种具体应用,随着互联网的发展,文本聚类应用越来越广泛,譬如在信息检索、智能搜索引擎等方面都有较为广泛的应用。文本聚类算法主要涉及文本预处理和文本聚类算法,故对文本聚类进行改进可以从这两方面入手。传统文本聚类的文本预处理采用VSM模型,该模型不考虑词与词的语义相似度和词与词的相关性,导致文本聚类精确度非常低。针对该问题,提出了基于特征空间文本聚类的方法。该方法根据文档集合的特征空间构造一个替代词库,并根据这个替代词库得到文档的主题,依据主题配合其对应的领域词典对文档词进行相应的替换。传统的文本聚类使用K-means算法,但该算法需要人工指定K值。为此,提出了基于K值优化的K-means改进算法。实验结果表明,所提出的文本聚类方法和K-means改进算法显著提高了文本聚类的智能性和精确性。Text clustering is a specific application of the clustering algorithm. With the development of Interact,the text clustering has gotten an increasingly wide utilization in many fields,such as information retrieval and intelligent search engine. Text clustering algorithm in- volves text preprocessing and text clustering primarily, so some improvements on text clustering from these two aspects have been conduc- ted. The traditional text clustering adopts the VSM without considering the semantic similarity and correlation between words, which leads to low accuracy. In view of it,the text clustering method based on feature space is proposed which constructs an alternative word library through the feature space of document collection and gets the document theme according to the alternative word library, and then replaces the words in document based on the themes and its corresponding domain dictionary. However the traditional text clustering algorithm must need artificial K value. Therefore, K -means algorithm is presented based on the K value optimization. The experimental results show that the two improvements above mentioned have made text clustering more intelligent and more precise.

关键词：知网领域词典主题义原聚类 K值优化

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征空间的文本聚类被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征空间的文本聚类 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于特征空间的文本聚类被引量：8