融合词向量的多特征句子相似度计算方法研究  被引量:14

Research on Multi-Feature Sentence Similarity Computing Method with Word Embedding

在线阅读下载全文

作  者:李峰[1,2] 侯加英 曾荣仁 凌晨[1] LI Feng;HOU Jiaying;ZENG Rongren;LING Chen(Logistics Science Research Institute of PLA, Beijing 100166, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China;School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China)

机构地区:[1]中国人民解放军后勤科学研究所,北京100166 [2]北京航空航天大学计算机学院,北京100191 [3]昆明理工大学信息工程与自动化学院,昆明650504

出  处:《计算机科学与探索》2017年第4期608-618,共11页Journal of Frontiers of Computer Science and Technology

基  金:国家自然科学基金No.61370126;国家高技术研究发展计划(863计划)No.2015AA016004;国家社会科学基金No.15GJ003-154;软件开发环境国家重点实验室探索性自主研究课题基金No.SKLSDE-2015ZX-16~~

摘  要:在归纳常见的句子相似度计算方法后,基于《人民日报》3.4万余份文本训练了用于语义相似度计算的词向量模型,并设计了一种融合词向量的多特征句子相似度计算方法。该方法在词方面,考虑了句子中重叠的词数和词的连续性,并运用词向量模型测量了非重叠词间的相似性;在结构方面,考虑了句子中重叠词的语序和两个句子的长度一致性。实验部分设计实现了4种句子相似度计算方法,并开发了相应的实验系统。结果表明:提出的算法能够取得相对较好的实验结果,对句子中词的语义特征和句子结构特征进行组合处理和优化,能够提升句子相似度计算的准确性。Based on the summarization of sentence similarity computing methods,this paper applies34000pieces oftexts of People??s Daily to train word vector space model for semantic similarity computing.Then,based on the trainedword vector model,this paper designs a multi-feature sentence similarity computing method,which takes both wordand sentence structure features into consideration.Firstly,the method takes note of possible effects of the number ofoverlapping words and word continuity,and then applies word vector model to calculate the semantic similarity of non overlapping words.Regarding the aspect of sentence structure,the method takes both overlapping word order and sentencelength conformity into consideration.Finally,this paper designs and implements four different sentence similaritycalculating methods,and further develops an experimental system.The experimental results show that the method proposedin this paper can get satisfactory results and the combination and optimization upon the features of words andsentence structures can improve the accuracy of sentence similarity calculating.

关 键 词:词向量 句子相似度 Word2vec 算法设计 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象