检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄建宇[1] 周爱武[1] 肖云[1] 谭天诚[1]
机构地区:[1]安徽大学计算机科学与技术学院,安徽合肥230601
出 处:《计算机技术与发展》2017年第9期75-77,81,共4页Computer Technology and Development
基 金:安徽大学大学生科研训练计划项目(J18520148)
摘 要:文本聚类是聚类算法的一种具体应用,随着互联网的发展,文本聚类应用越来越广泛,譬如在信息检索、智能搜索引擎等方面都有较为广泛的应用。文本聚类算法主要涉及文本预处理和文本聚类算法,故对文本聚类进行改进可以从这两方面入手。传统文本聚类的文本预处理采用VSM模型,该模型不考虑词与词的语义相似度和词与词的相关性,导致文本聚类精确度非常低。针对该问题,提出了基于特征空间文本聚类的方法。该方法根据文档集合的特征空间构造一个替代词库,并根据这个替代词库得到文档的主题,依据主题配合其对应的领域词典对文档词进行相应的替换。传统的文本聚类使用K-means算法,但该算法需要人工指定K值。为此,提出了基于K值优化的K-means改进算法。实验结果表明,所提出的文本聚类方法和K-means改进算法显著提高了文本聚类的智能性和精确性。Text clustering is a specific application of the clustering algorithm. With the development of Interact,the text clustering has gotten an increasingly wide utilization in many fields,such as information retrieval and intelligent search engine. Text clustering algorithm in- volves text preprocessing and text clustering primarily, so some improvements on text clustering from these two aspects have been conduc- ted. The traditional text clustering adopts the VSM without considering the semantic similarity and correlation between words, which leads to low accuracy. In view of it,the text clustering method based on feature space is proposed which constructs an alternative word library through the feature space of document collection and gets the document theme according to the alternative word library, and then replaces the words in document based on the themes and its corresponding domain dictionary. However the traditional text clustering algorithm must need artificial K value. Therefore, K -means algorithm is presented based on the K value optimization. The experimental results show that the two improvements above mentioned have made text clustering more intelligent and more precise.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30