基于LLE-k均值方法的中文文本聚类

A Method of LLE-k Means for Chinese Text Clustering

出　　处：《计算机与数字工程》2010年第11期10-12,21,共4页Computer & Digital Engineering

基　　金：国家自然科学基金项目(编号:60973094);江苏省自然科学基金项目(编号:BK2009538);江苏省高校自然科学基金项目(编号:08KJB520002;09KJB520004);国家基金项目(编号:61070121)资助

摘　　要：文本聚类中,文本特征向量的高维特性使得对样本统计特征的评估十分困难,所以有必要进行有效的维数简约。LLE算法利用线性重构的局部对称性找出高维数据空间中的非线性结构,并在保持各数据点临近位置关系情况下,把高维空间数据点映射为低维空间对应的数据点。文章采用LLE-k均值方法进行中文文本聚类研究。首先利用LLE进行降维处理,然后对得到的线性特征向量用k均值进行聚类分析,与PCAI、SOMAP和LLE算法比较,结果显示LLE-k均值算法能得到更好的可视化效果。In text clustering,the high dimensional characteristics of text feature vector make the assessment of statistical characteristics very difficult,it is necessary for effective dimensional reduction.In locally linear embedding algorithm,the nonlinear structure in high dimensional data space is exploited with the local symmetries of linear reconstructions.The data points in high dimensional space are mapped into corresponding data points in lower dimensional space under preserving distance between data points.This paper use LLE-k means to research Chinese text clustering.Firstly,reducing dimension with LLE algorithm,and then using k means algorithm to cluster and analysis,moreover,comparing with PCA,ISOMAP,and LLE.The results show that the LLE-k means get the better visualization.

关键词：文本聚类 LLE 维数简约 K-MEANS

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LLE-k均值方法的中文文本聚类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LLE-k均值方法的中文文本聚类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索