解决文本聚类集成问题的两个谱算法被引量：20

Two Spectral Algorithms for Ensembling Document Clusters

机构地区：[1]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨工程大学信息与通信工程学院,哈尔滨150001

出　　处：《自动化学报》2009年第7期997-1002,共6页Acta Automatica Sinica

基　　金：国家自然科学基金(60603092);国家教育部博士点基金(20070217043)资助~~

摘　　要：聚类集成中的关键问题是如何根据不同的聚类器组合为最终的更好的聚类结果.本文引入谱聚类思想解决文本聚类集成问题,然而谱聚类算法需要计算大规模矩阵的特征值分解问题来获得文本的低维嵌入,并用于后续聚类.本文首先提出了一个集成算法,该算法使用代数变换将大规模矩阵的特征值分解问题转化为等价的奇异值分解问题,并继续转化为规模更小的特征值分解问题;然后进一步研究了谱聚类算法的特性,提出了另一个集成算法,该算法通过求解超边的低维嵌入,间接得到文本的低维嵌入.在TREC和Reuters文本数据集上的实验结果表明,本文提出的两个谱聚类算法比其他基于图划分的集成算法鲁棒,是解决文本聚类集成问题行之有效的方法.A critical problem in cluster ensemble is how to combine multiple clusters to yield a superior result. In tins paper, the idea of spectral clustering algorithm is brought into the document cluster ensemble problem. Since spectral clustering algorithm needs to solve eigenvalue decomposition problem of a large scale matrix to get the low dimensional embedding of documents for later clustering, a fast spectral algorithm is first proposed, in which the large scale matrix eigenvalue decomposition problem is transformed to an equivalent singular value decomposition problem and then to a much smaller matrix eigenvalue decomposition problem. The characteristic of spectral clustering algorithm is further investigated and another spectral algorithm is proposed, in which the low dimensional embedding of documents are obtained indirectly by those of hyperedges. Experiments on TREC and Reuters document sets show that both proposed spectral algorithms outperform other cluster ensemble techniques based on graph partitioning, and can effectively solve document cluster ensemble problem.

关键词：聚类分析聚类集成谱聚类文本聚类

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

解决文本聚类集成问题的两个谱算法被引量：20

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

解决文本聚类集成问题的两个谱算法 被引量：20

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

解决文本聚类集成问题的两个谱算法被引量：20