检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]安徽工业大学计算机科学与技术学院,安徽马鞍山243002
出 处:《计算机技术与发展》2015年第9期119-122,共4页Computer Technology and Development
基 金:安徽省高校自然科学研究重点项目(KJ2013Z023;KJ2013A058);安徽省振兴计划资助项目(2013ZDJY073)
摘 要:针对词语相似度计算中结果合理性的问题,文中基于对"知网"中词语、义项和义原三个层次概念的研究,提出一种结合信息论研究中熵的概念的新的词语相似度方法。首先是引入词表相似度计算对词语集进行合理选取,再根据义原信息熵对各义原进行权重上的平衡,抑制一些常见义原在词语的义原集中比重过大而导致计算结果与真实情况相比出现明显误差的情况。实验结果表明,与传统方法相比,文中方法在实验并未出现1.000这样过于绝对的结果,提高了结果的合理性;并且实验词语集而非两词语之间,说明比较的效率也得到了提高。The words similarity computation is widely used in the area of natural language processing. In this paper,based on the research of words,concepts and sememe in HowNet,a new algorithm of word similarity based on information entropy is proposed. Firstly,similari-ty of words surface is led in this paper for selecting words from words set reasonably. Secondly,weight of each sememe would be bal-anced on the basis of information entropy to inhibition that common sememe would be much more than others in the sememe set what would result in obvious error comparing with physical truth. Experimental results show that compared with traditional methods,the unrea-sonable result like 1. 000 is no-show,which means that the result is rational. In addition,this experiment is based on words set instead of two words,which means that the method is more efficient.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.219.203.214