基于多视图的文本聚类改进方法  被引量:3

An improvement of text clustering method based on multi-view

在线阅读下载全文

作  者:王卫红[1] 李樊 金凌剑 WANG Weihong;LI Fan;JIN Lingjian(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,浙江杭州310023

出  处:《浙江工业大学学报》2021年第1期1-8,共8页Journal of Zhejiang University of Technology

基  金:浙江省自然科学基金资助项目(LZ14F020001)。

摘  要:近年来,随着自然语言处理技术的发展,聚类技术在文本处理领域中的作用愈发凸显。目前,国内多视图文本聚类的相关研究进展仍处于起步阶段,通常运用的聚类方法是基于文本的单一领域来展现特定方面的聚类情况,但越来越多的文本聚类研究从单视图向多视图的方向转变。提出了一种以LDA主题模型和TF-WIDF特征提取算法作为特征向量组,基于谱聚类的改进型多视图半监督文本聚类方法。该方法基于半监督的协同训练(Co-training)算法,通过对协同训练算法中的文本标记方式进行改进,实现无监督性质的多视图协同训练算法。实验结果表明:改进算法相较于传统单视图文本聚类算法,很大程度上避免了单视图算法的偶然性和局限性,提高了文章整体聚类的准确度。In recent years,with the development of natural language processing technology,clustering technology plays a more and more important role in the field of text processing.At present,domestic research progress on multi-view text clustering is still in its initial stage.In generally,clustering methods are based on the single view field of text to show the clustering situation of specific aspects,but more and more text clustering research has changed from single-view to multi-view.In this paper,we propose an improved multi-view semi-supervised text clustering method based on spectral clustering with LDA topic model and TF-WIDF feature extraction algorithm as feature vector group.This method is based on Co-training.By improving the text labeling method in Co-training algorithm,the unsupervised multi-view cooperative training algorithm is realized.The experimental results show that compared with the traditional single-view text clustering algorithm,the improved algorithm greatly avoids the contingency and limitation of the single-view algorithm,and improves the accuracy of the overall clustering of articles.

关 键 词:文本聚类 LDA TF-WIDF CO-TRAINING 谱聚类 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象