检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:田红鹏[1] 马博 冯健[1] TIAN Hong-peng;MA Bo;FENG Jian(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710600,China)
机构地区:[1]西安科技大学计算机科学与技术学院,陕西西安710600
出 处:《计算机工程与设计》2021年第11期3239-3245,共7页Computer Engineering and Design
基 金:陕西省自然科学基础研究计划基金项目(2020JM-533)。
摘 要:目前传统的文本相似度方法大多数存在未考虑语义及结构信息,容易忽略文本特征细节信息等问题。针对上述问题,提出多模型加权融合的文本相似度计算算法。利用词频、词性、词句位置3个特征共同计算句子相似度;为发现文本的结构信息方面,提出分层池化IIG-SIF用于计算文本的相似程度;结合前两个环节的相似度模型构建一种线性加权模型,汇集两个算法使结果更为精确。实验结果表明,该算法能够提高准确率和召回率,在不同语种和粒度的数据集上均得到更优的实验结果。Most of the current traditional text similarity methods fail to consider the semantic and structural information,and it is easy to ignore the details of the text features and other issues.Aiming at the above problems,a text similarity calculation algorithm based on multi-model weighted fusion was proposed.The three characteristics of word frequency,part of speech,and word and sentence position were used to jointly calculate sentence similarity.To find the structural information of the text,a hierarchical pooling IIG-SIF was proposed to calculate the similarity of the text.The similarity models of first two were combined to construct a linear weighting model,by which two algorithms were brought together to make the result more accurate.Experimental results show that the proposed algorithm can improve the accuracy and recall rate,and obtain better experimental results on data sets of different languages and granularities.
关 键 词:文本相似度 特征融合 词移距离 分层池化 句向量
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.182