一种增量式文本软聚类算法被引量：3

Incremental Algorithm of Text Soft Clustering

出　　处：《西安交通大学学报》2007年第4期398-401,411,共5页Journal of Xi'an Jiaotong University

基　　金：国家自然科学基金资助项目(60673087)

摘　　要：针对传统文本聚类算法时间复杂度较高,而与距离无关的算法又不适用于动态、变化的文本集等问题,提出了一种基于语义序列的增量式文本软聚类算法.该算法考虑了长文本的多主题特性,并利用语义序列相似关系计算相似语义序列集合的覆盖度,同时将每次选择的具有最小熵重叠值的候选类作为一个结果聚类,这样在整个聚类的过程中大大减小了文本向量空间的维数,缩短了计算时间.由于所提算法的语义序列只与文本自身相关,所以它适用于增量式聚类.实验结果表明,算法的聚类精度高于同条件下的其他聚类算法,尤其适合于长文本集的软聚类.Focusing on the problems that the text clustering has high time complexity, the algo- rithms that are independent on the distance are unsuitable for dynamic and changing corpus, and the multi-subject characteristics of a single text cannot be considered in traditional algorithms, an incremental algorithm of text soft clustering based on semantic sequence is proposed, in which the clustering candidate with minimum entropy overlap value is selected as a result cluster by using similarity relation of semantic sequences and calculating the coverage of similarity semantic sequences set. The dimensions of text vector space are decreased dramatically in the clustering procedure, so the computing time can be reduced. Since the semantic sequence is only related to text, it is available for incremental clustering. The comparison of experimental results shows that the algorithm can achieve higher precision than other algorithms under same conditions, especially for soft clustering of long texts set.

关键词：语义序列增量式聚类软聚类文本聚类

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种增量式文本软聚类算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种增量式文本软聚类算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种增量式文本软聚类算法被引量：3