一种基于谱分割的短文本聚类算法被引量：1

A Short Text Clustering Algorithm Based on Spectral Cut

机构地区：[1]西北师范大学计算机科学与工程学院,兰州730070 [2]北京师范大学信息科学与技术学院,北京100875

出　　处：《计算机工程》2016年第8期178-182,共5页Computer Engineering

基　　金：国家自然科学基金资助项目(61163039;61363058);甘肃省青年科技基金资助项目(1308TJY085;145RJYA259);中国科学院计算技术研究所智能信息处理重点实验室开放基金资助项目(IIP2014-4)

摘　　要：短文本具有稀疏高维的特点,现有聚类算法在大规模短文本上的聚类精度较低且效率低下。针对该问题,提出一种以谱聚类理论作支撑,基于谱分割准则RMcut的新聚类算法。依据谱聚类理论,将短文本集合构建成一张带权无向图,并计算得到文档-文档的相似度矩阵,为聚类算法提供信息。不断迭代地用2-way方式划分该图,划分过程中使用RMcut值作为划分是否终止的条件,利用Prim算法将原图中的顶点加入到聚族中,以得到质量较高的聚类结果。实验结果表明,该算法具有较高的时间性能,与K-means算法、词共现聚类算法及基于免疫的聚类算法相比,聚类结果更准确。Short text has the characteristics of sparsity and high dimension,and the existing clustering algorithm for the large-scale short text has low accuracy and efficiency. Aiming at this problem, a novel clustering method based on spectral clustering theory and spectral cut standard RMcut is proposed. According to spectral clustering theory, short text collection is constructed into a weighted undirected graph, and a document similarity matrix is constructed by calculating the similarity, which provides all information for the clustering algorithm. Two-way method is used to partition the graph into two parts iteratively. RMcut is used as the termination condition in the process of partitioning, and Prim algorithm is utilized to add nodes in the original graph into clusters for the purpose of obtaining high-quality clustering results. Experimental results demonstrate that this algorithm has high time performance and shows better clustering results than other algorithms, such as K-means algorithm, word co-occurrence clustering algorithm and immunity-based clustering algorithm.

关键词：短文本相似度矩阵无向带权图 RMcut准则聚类算法

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于谱分割的短文本聚类算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于谱分割的短文本聚类算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于谱分割的短文本聚类算法被引量：1