DMK算法在中文文本聚类中的应用被引量：1

Application of the DMK algorithm to the Chinese text clustering

作　　者：季圣洁葛万成[1] Ji Shengjie;Ge Wancheng(Sino-German College,Tongji University,Shanghai 200092,Chin)

机构地区：[1]同济大学中德学院,上海200092

出　　处：《信息通信》2018年第7期1-4,共4页Information & Communications

基　　金：上海市科学技术委员会科研项目<基于个性化推荐技术的航空移动社区服务模式研究与应用>(项目号:14DZ1101400)

摘　　要：对中文文本聚类进行了研究,将所改进的DMK算法应用于实际的中文文本聚类中。将百度百科数据集中的不同类型词条内容经过文本处理(分词、去停用词、特征选取、降维)后分别使用原始K-means聚类算法和DMK(Density-based and Max-min-distance K-means)算法进行聚类,并选择F-measure值及RI值(Rand Index)等指标对聚类结果进行分析。结果表明,针对实验中使用的百度百科中文数据集,DMK算法的F-measure值较原始算法平均提高0.342%,RI值较原始算法平均提高9.34%,验证了所设计的DMK算法对实际中文文本聚类的实质性优化。To research on the Chinese text clustering, applying the DMK（Density-based and Max-min-distance K-means） algorithm to the actual Chinese text clustering. After text treatment（words-splitting, stopwords-removing, feature selection, dimensionality reduction）, different types of article contents from Baidu Encyclopedia was clustered using the original K-means algorithm and the DMK algorithm. The F-measure and Rand Index values were selected to analyze the clustering results. The results showed that, for the Chinese data set from Baidu Encyclopedia used in the experiment, the F-measure value of the DMK algorithm improved an average of 0.342% and the RI value of DMK algorithm improved an average of 9.34% than the original algorithm. The results verified the DMK algorithm＇s effect of improving performance in actual Chinese text clustering.

关键词：K-MEANS DMK算法文本聚类分词文本挖掘

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

DMK算法在中文文本聚类中的应用被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

DMK算法在中文文本聚类中的应用 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

DMK算法在中文文本聚类中的应用被引量：1