基于HybridDL模型的文本相似度检测方法被引量：3

Text similarity detection method based on HybridDL model

作　　者：肖晗毛雪松[1] 朱泽德 Xiao Han;Mao Xuesong;Zhu Zede(School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081,China;Institute of Technology Innovation,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031,China)

机构地区：[1]武汉科技大学信息科学与工程学院,湖北武汉430081 [2]中科院合肥技术创新工程院,安徽合肥230031

出　　处：《电子技术应用》2020年第6期28-31,35,共5页Application of Electronic Technique

基　　金：国家自然科学基金(61806187)。

摘　　要：为了提高文本相似度检测算法的准确度,提出一种结合潜在狄利克雷分布(Latent Dirichlet Allocation,LDA)与Doc2Vec模型的文本相似度检测方法,并把该算法得到的模型命名为HybridDL模型。该算法通过Doc2Vec对文档训练得到文档向量,再利用LDA模型得到文档主题与各个主题下特征词出现的概率,对文档中各主题及特征词计算概率加权和,映射到Doc2Vec文档向量中。实验结果表明,新算法模型比传统的Doc2Vec模型对相似文本的判断更加敏感,在文本相似度检测上具有更高的准确度。In order to improve the accuracy of text similarity detection algorithm,this paper proposes a text similarity detection method combining latent Dirichlet Allocation(LDA)and Doc2Vec model,and names the model obtained by the algorithm HybridDL model.This algorithm obtains the document vector through Doc2Vec training of the document,and then obtains the probability of the occurrence of the document topic and the feature words under each topic with the LDA model,calculates the probability weighted sum of each topic and feature words in the document,and maps them to the Doc2Vec document vector.Experimental results show that the new algorithm model is more sensitive to the judgment of similar text than the traditional Doc2Vec model,and has higher accuracy in the detection of text similarity.

关键词：Doc2Vec 潜在狄利克雷分布文本表示文本相似度

分类号：TN957.52[电子电信—信号与信息处理] TP391.1[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于HybridDL模型的文本相似度检测方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于HybridDL模型的文本相似度检测方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于HybridDL模型的文本相似度检测方法被引量：3