多模型加权融合的文本相似度计算被引量：7

Text similarity calculation based on multi model weighted fusion

作　　者：田红鹏[1] 马博冯健[1] TIAN Hong-peng;MA Bo;FENG Jian(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710600,China)

机构地区：[1]西安科技大学计算机科学与技术学院,陕西西安710600

出　　处：《计算机工程与设计》2021年第11期3239-3245,共7页Computer Engineering and Design

基　　金：陕西省自然科学基础研究计划基金项目(2020JM-533)。

摘　　要：目前传统的文本相似度方法大多数存在未考虑语义及结构信息,容易忽略文本特征细节信息等问题。针对上述问题,提出多模型加权融合的文本相似度计算算法。利用词频、词性、词句位置3个特征共同计算句子相似度;为发现文本的结构信息方面,提出分层池化IIG-SIF用于计算文本的相似程度;结合前两个环节的相似度模型构建一种线性加权模型,汇集两个算法使结果更为精确。实验结果表明,该算法能够提高准确率和召回率,在不同语种和粒度的数据集上均得到更优的实验结果。Most of the current traditional text similarity methods fail to consider the semantic and structural information,and it is easy to ignore the details of the text features and other issues.Aiming at the above problems,a text similarity calculation algorithm based on multi-model weighted fusion was proposed.The three characteristics of word frequency,part of speech,and word and sentence position were used to jointly calculate sentence similarity.To find the structural information of the text,a hierarchical pooling IIG-SIF was proposed to calculate the similarity of the text.The similarity models of first two were combined to construct a linear weighting model,by which two algorithms were brought together to make the result more accurate.Experimental results show that the proposed algorithm can improve the accuracy and recall rate,and obtain better experimental results on data sets of different languages and granularities.

关键词：文本相似度特征融合词移距离分层池化句向量

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模型加权融合的文本相似度计算被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模型加权融合的文本相似度计算 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

多模型加权融合的文本相似度计算被引量：7