基于路径与词林编码的词语相似度计算方法  被引量:8

Word Similarity Calculation Method Based on Path and CiLin Coding

在线阅读下载全文

作  者:王松松 高伟勋 徐逸凡 WANG Songsong;GAO Weixun;XU Yifan(College of Information and Mechatronic Engineering,Shanghai Normal University,Shanghai 200134,China)

机构地区:[1]上海师范大学信息与机电工程学院,上海200134

出  处:《计算机工程》2018年第10期160-167,共8页Computer Engineering

摘  要:现有词语相似度计算方法主要针对词语的路径结构进行计算,较少深入考虑词语的语义信息,导致计算结果不够准确。针对该问题,提出一种改进的词语语义相似度计算方法。将词语的词林编码与路径结构相结合,同时利用局部敏感哈希算法和海明距离计算词林编码之间的相似度。在MC和RG数据集上的实验结果表明,该方法可使皮尔逊相关系数分别达到0. 897 4和0. 866 8,较传统基于路径和深度的计算方法准确性更高。The existed similarity calculation methods of words are mainly focus on the path structure of words and consider less about the semantic information of words in detail,which lead to inaccurate calculation results.Aiming at this problem,an improved semantic word similarity calculation method is proposed.The CiLin coding and path structure are combined to calculate the similarity between CiLin coding,while using local sensitive Hash algorithm and Hamming distance.Experimental results show that the proposed method can make Pearson correlation coefficients achieve 0.897 4 and 0.866 8 on the MC data set and the RG data set respectively.It is more accurate than the traditional path-based and depth-based calculation methods.

关 键 词:同义词 路径结构 编码 词语相似度 局部敏感哈希算法 语义 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象