一种基于语义与句法结构的短文本相似度计算方法  被引量:19

A short text similarity calculation method based on semantics and syntax structure

在线阅读下载全文

作  者:赵谦 荆琪 李爱萍 段利国[1] ZHAO Qian;JING Qi;LI Ai-ping;DUAN Li-guo(College of Information and Computer,Taiyuan University of Technology,Taiyuan 03002;State Key Laboratory of Software Engineering,Wuhan University,Wuhan 430072,China)

机构地区:[1]太原理工大学信息与计算机学院,山西太原030024 [2]武汉大学软件工程国家重点实验室,湖北武汉430072

出  处:《计算机工程与科学》2018年第7期1287-1294,共8页Computer Engineering & Science

基  金:武汉大学软件工程国家重点实验室开放课题(SKLSE2012-09-30);山西省自然科学基金(2013011015-2)

摘  要:为了提高短文本语义相似度计算的准确率,提出一种新的计算方法:将文本分割为句子单元,对句子进行句法依存分析,句子之间相似度计算建立在词语间相似度计算的基础上,在计算词语语义相似度时考虑词语的新特征——情感特征,并提出一种综合方法对词语进行词义消歧,综合词的词性与词语所处的语境,再依据Hownet语义词典计算词语语义相似度;将句子中词语之间的语义相似度根据句子结构加权平均得到句子的语义相似度,最后通过一种新的方法——二元集合法——计算短文本的语义相似度。词语相似度与短文本相似度的准确率分别达到了87.63%和93.77%。实验结果表明,本文方法确实提高了短文本语义相似度的准确率。In order to improve the accuracy of short text semantic similarity calculation,we propose a new calculation method.Firstly the short text is segmented to sentence units and we conduct syntactic dependency analysis.Similarity calculation between sentences is based on the similarity calculation between words.We then propose to take the emotional characteristics of the words into consideration when calculating semantic similarity,and put forward a comprehensive method for word sense disambiguation.Based on the parts of words and the context,we leverage the Hownet semantic dictionary to do word semantic similarity calculation.The semantic similarity of sentences is obtained by the weighted average of the semantic similarity between words in a sentence according to sentence structures.Finally we calculate the semantic similarity of short texts through a new method called binary set.Experimental results show that the accuracy of word similarity and short text similarity reaches 87.63% and 93.77%respectively,which demonstrates the improvement in the accuracy of semantic similarity.

关 键 词:词义消歧 情感特征 句法依存分析 短文本语义相似度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象