基于依存句法与词语语义的汉语句子相似度计算  被引量:1

Chinese sentence similarity calculation based on dependency syntax and word semantics

在线阅读下载全文

作  者:申震 王逊 黄树成 周尓昊 SHEN Zhen;WANG Xun;HUANG Shucheng;ZHOU Erhao(School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China)

机构地区:[1]江苏科技大学计算机学院,镇江212100

出  处:《江苏科技大学学报(自然科学版)》2022年第2期65-72,共8页Journal of Jiangsu University of Science and Technology:Natural Science Edition

基  金:国家自然科学基金资助项目(61772244)。

摘  要:针对现有的句子相似度计算中仅考虑单个词语的语义信息,而忽略了句子的语法结构信息的问题,提出了一种结合依存句法分析和词语语义相似度的计算方法.使用哈工大社会计算与信息检索研究中心研发的语言技术平台对句子进行句法分析获得依存句法分析树,从中构造包含句子成分、依存关系、词语等多特征信息的依存关系三元组.采用动态加权的方法充分利用词语在知网与同义词词林中的语义信息,通过依存关系三元组体现句子语法结构和词语语义两个层面上的语义信息,提高了相似度计算的合理性,并且扩大了可计算相似度的词语范围.实验表明:该相似度计算方法的准确性相比同类方法有了一定的提高,能更为准确的衡量句子间的相似度.Aiming at the existing sentence similarity calculation that only considers the semantic information of a single word,but ignores the grammatical structure information of the sentence,a calculation method combining dependent syntax analysis and word semantic similarity is proposed.This method uses the language technology platform developed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology to perform syntactic analysis on sentences to obtain dependency syntax analysis trees,and construct dependency relation triples containing sentence components,dependency relations,words and other characteristic information.Making full use of the semantic information of the words in HowNet and Tongyici Cilin,and reflecting the semantic information at the two levels of sentence grammatical structure and word semantics through the dependency triples,the dynamic weighting method improves the rationality of similarity calculation and expands the range of words for which similarity can be calculated.Experiments show that the accuracy of the similarity calculation method is improved to a certain extent compared with similar methods,and it can measure the similarity between sentences more accurately.

关 键 词:相似度计算 依存句法分析 知网 词林 句子相似度 词语相似度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象