基于信息熵的新的词语相似度算法研究  被引量:3

Research of a New Algorithm of Words Similarity Based on Information Entropy

在线阅读下载全文

作  者:王小林[1] 陆骆勇 邰伟鹏[1] 

机构地区:[1]安徽工业大学计算机科学与技术学院,安徽马鞍山243002

出  处:《计算机技术与发展》2015年第9期119-122,共4页Computer Technology and Development

基  金:安徽省高校自然科学研究重点项目(KJ2013Z023;KJ2013A058);安徽省振兴计划资助项目(2013ZDJY073)

摘  要:针对词语相似度计算中结果合理性的问题,文中基于对"知网"中词语、义项和义原三个层次概念的研究,提出一种结合信息论研究中熵的概念的新的词语相似度方法。首先是引入词表相似度计算对词语集进行合理选取,再根据义原信息熵对各义原进行权重上的平衡,抑制一些常见义原在词语的义原集中比重过大而导致计算结果与真实情况相比出现明显误差的情况。实验结果表明,与传统方法相比,文中方法在实验并未出现1.000这样过于绝对的结果,提高了结果的合理性;并且实验词语集而非两词语之间,说明比较的效率也得到了提高。The words similarity computation is widely used in the area of natural language processing. In this paper,based on the research of words,concepts and sememe in HowNet,a new algorithm of word similarity based on information entropy is proposed. Firstly,similari-ty of words surface is led in this paper for selecting words from words set reasonably. Secondly,weight of each sememe would be bal-anced on the basis of information entropy to inhibition that common sememe would be much more than others in the sememe set what would result in obvious error comparing with physical truth. Experimental results show that compared with traditional methods,the unrea-sonable result like 1. 000 is no-show,which means that the result is rational. In addition,this experiment is based on words set instead of two words,which means that the method is more efficient.

关 键 词:词语相似度 知网 义原 信息熵 词表相似度 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象