基于参考区域的k-means文本聚类算法被引量：9

Reference-based k-means algorithm for document clustering

出　　处：《计算机工程与设计》2009年第2期401-403,407,共4页Computer Engineering and Design

摘　　要：k-means是目前常用的文本聚类算法,该算法的主要缺点需要人工指定聚类的最终个数k及相应的初始中心点。针对这些缺点,提出一种基于参考区域的初始化方法,自动生成k-means的初始化分区,并且在参考区域的生成过程中,设计一种求最大斜率(绝对值)的方法确定自动阈值。理论分析和实验结果表明,该改进算法能有效的提高文本聚类的精度,且具有可行的效率。The k-means algorithm is a popular method for clustering document collections, but the main drawbacks of k-means are the random selection of initial centers and the assignation of final clustering＇s number. A new kind of initialization is presented, based on reference region. And a method that can select threshold automatically in generating partitions is also proposed. The theory analysis and experimental results show that the improved algorithm qualitatively improves k-means clustering and its computation is also feasible.

关键词：文本聚类 K-MEANS CURD 向量空间模型参考区域

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参考区域的k-means文本聚类算法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参考区域的k-means文本聚类算法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于参考区域的k-means文本聚类算法被引量：9