基于句向量和卷积神经网络的文本聚类研究被引量：6

Research on Text Clustering Based on Sentence Vector and Convolutional Neural Network

作　　者：贾君霞[1] 王会真任凯康文 JIA Junxia;WANG Huizhen;REN Kai;KANG Wen(School of Electronics and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China;Guodian Gansu New Energy Co.,Ltd.,Lanzhou 730070,China)

机构地区：[1]兰州交通大学电子与信息工程学院,兰州730070 [2]国电甘肃新能源有限公司,兰州730070

出　　处：《计算机工程与应用》2022年第16期123-128,共6页Computer Engineering and Applications

基　　金：国家自然科学基金(51867012);甘肃省科技计划资助项目(1504WKCA016)。

摘　　要：针对文本聚类时文本特征维度高,忽略文档词排列顺序和语义等问题,提出了一种基于句向量(Doc2vec)和卷积神经网络(convolutional neural networks,CNN)的文本特征提取方法用于文本聚类。首先利用Doc2vec模型把训练数据集中的文本转换成句向量,充分考虑文档词排列顺序和语义;然后利用CNN提取文本的深层语义特征,解决特征维度高的问题,得到能够用于聚类的文本特征向量;最后使用k-means算法进行聚类。实验结果表明,在爬取的搜狗新闻数据上,该文本聚类模型的准确率达到了0.776,F值指标达到了0.780,相比其他文本聚类模型均有所提高。Aiming at the problems of the high dimensionality of text features in text clustering,and ignoring the order and semantics of document words,this paper proposes a text feature extraction method based on Doc2vec and convolutional neural networks(CNN)for text clustering.Firstly,use the Doc2vec model to convert the text in the training dataset into sentence vectors,fully consider the order and semantics of the document words.Then,use CNN to extract the deep semantic features of the text,solve the problem of high feature dimensions,and obtain the data that can be used for clustering text feature vector.Finally,use the k-means algorithm for clustering.The experimental results show that on the crawled Sogou news data,the accuracy of the text clustering model proposed in this paper has reached 0.776,and the F-score index has reached 0.780,which is improved compared to other text clustering models.

关键词：卷积神经网络(CNN) Doc2vec 文本表示文本聚类

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句向量和卷积神经网络的文本聚类研究被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句向量和卷积神经网络的文本聚类研究 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于句向量和卷积神经网络的文本聚类研究被引量：6