基于R-Grams的文本聚类方法被引量：1

Novel text clustering approach based on R-Grams

机构地区：[1]温州大学瓯江学院,浙江温州325035 [2]温州信息化研究中心,浙江温州325035 [3]湖北文理学院数学与计算机科学学院,湖北襄阳441053 [4]西南大学逻辑与智能研究中心,重庆400715 [5]浙江传媒学院新媒体学院,杭州310018

出　　处：《计算机应用》2015年第11期3130-3134,共5页journal of Computer Applications

基　　金：浙江省自然科学基金资助项目(LY13F010005);教育部人文社会科学研究项目(15YJAZH015);湖北省科技支撑计划软科学项目(2015BDH109);温州市科技计划项目(R20130021)

摘　　要：针对传统文本聚类中存在着聚类准确率和召回率难以平衡等问题,提出了一种基于R-Grams文本相似度计算方法的文本聚类方法。该方法首先通过将待聚类文档降序排列,其次采用R-Grams文本相似度算法计算文本之间的相似度并根据相似度实现各聚类标志文档的确定并完成初始聚类,最后通过对初始聚类结果进行聚类合并完成最终聚类。实验结果表明:聚类结果可以通过聚类阈值灵活调整以适应不同的需求,最佳聚类阈值为15左右。随着聚类阈值的增大,各聚类准确率增大,召回率呈现先增后降的趋势。此外,该聚类方法避免了大量的分词、特征提取等繁琐处理,实现简单。Focusing on the issue that the clustering accuracy rate and recall rate are difficult to balance in traditional text clustering algorithms, a clustering approach based on the R-Grams text similarity computing algorithm was proposed. Firstly, the clustered documents were sorted in descending order; secondly, the symbolic documents were identified and then initial clustering results were achieved by using an R-Grams-based similarity computing algorithm; finally, the final clustering results were completed by combining the initial clustering. The experimental results show that the proposed approach can flexibly regulate the clustering results by adjusting the clustering threshold parameter to satisfy different demands and the optimal parameter is about 15. With the increasing of the clustering threshold, the clustering accuracies increase, and the recalls increase at first, then decrease. In addition, the approach is free from time-consuming processing procedures such as word segmentation and feature extraction and can be easily implemented.

关键词：文本聚类随机 R-Grams

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于R-Grams的文本聚类方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于R-Grams的文本聚类方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于R-Grams的文本聚类方法被引量：1