检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王卫红[1] 李樊 金凌剑 WANG Weihong;LI Fan;JIN Lingjian(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
机构地区:[1]浙江工业大学计算机科学与技术学院,浙江杭州310023
出 处:《浙江工业大学学报》2021年第1期1-8,共8页Journal of Zhejiang University of Technology
基 金:浙江省自然科学基金资助项目(LZ14F020001)。
摘 要:近年来,随着自然语言处理技术的发展,聚类技术在文本处理领域中的作用愈发凸显。目前,国内多视图文本聚类的相关研究进展仍处于起步阶段,通常运用的聚类方法是基于文本的单一领域来展现特定方面的聚类情况,但越来越多的文本聚类研究从单视图向多视图的方向转变。提出了一种以LDA主题模型和TF-WIDF特征提取算法作为特征向量组,基于谱聚类的改进型多视图半监督文本聚类方法。该方法基于半监督的协同训练(Co-training)算法,通过对协同训练算法中的文本标记方式进行改进,实现无监督性质的多视图协同训练算法。实验结果表明:改进算法相较于传统单视图文本聚类算法,很大程度上避免了单视图算法的偶然性和局限性,提高了文章整体聚类的准确度。In recent years,with the development of natural language processing technology,clustering technology plays a more and more important role in the field of text processing.At present,domestic research progress on multi-view text clustering is still in its initial stage.In generally,clustering methods are based on the single view field of text to show the clustering situation of specific aspects,but more and more text clustering research has changed from single-view to multi-view.In this paper,we propose an improved multi-view semi-supervised text clustering method based on spectral clustering with LDA topic model and TF-WIDF feature extraction algorithm as feature vector group.This method is based on Co-training.By improving the text labeling method in Co-training algorithm,the unsupervised multi-view cooperative training algorithm is realized.The experimental results show that compared with the traditional single-view text clustering algorithm,the improved algorithm greatly avoids the contingency and limitation of the single-view algorithm,and improves the accuracy of the overall clustering of articles.
关 键 词:文本聚类 LDA TF-WIDF CO-TRAINING 谱聚类
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222