检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郜炎峰 林燕芬[1] 王忠建[1] GAO Yan-feng LIN Yan-fen WANG Zhong-jian(School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China)
机构地区:[1]哈尔滨商业大学计算机与信息工程学院,哈尔滨150028
出 处:《哈尔滨商业大学学报(自然科学版)》2017年第1期73-76,共4页Journal of Harbin University of Commerce:Natural Sciences Edition
基 金:黑龙江省自然科学基金(F201243);黑龙江省教育厅科研项目(12511127)
摘 要:语句相似度计算在自然语言处理领域是一项非常重要的实用技术,基于马尔科夫模型的汉语语句相似度计算方法通过对语句进行分词处理、构建特征词向量以及权重值向量的方式实现了语句相似度计算.该方法以关系向量模型为基础,通过深入研究汉语语句的特征,利用前后相邻词的共现对权重值向量进行加权处理,以调整不同特征词的权重.方法重点考虑了关键词词形的相似度,结合了句长、词序等表面信息的相似度,并考虑了同义词的情况.最后采用两种不同的方案与关系向量模型进行了对比实验,结果表明方法可以更好的处理长度差很大的两个语句的相似度计算问题,尤其在检索相关新闻标题时准确率较高.Sentence similarity computation is a very important practical technology in the field of natural language processing. Chinese sentence similarity computation based on Markoff was realized by using word segmentation processing, constructing characteristic word vector and weight value vector. The method was based on the relation vector model. In order to ad- just the weight of the different characters, the characteristics of Chinese sentences and the weight of neighboring words appearing together were studied in this paper. The keywords morphological similarity was the key factor. The surface information of sentence length, word order similarity and the synonyms are all taken into account, Finally, two kinds of different schemes are used to compare with the relation vector model. The experimental results showed that the method could better handle the similarity of two sentences with a large difference in the length; especially the accuracy rate was higher in the retrieval of relevant news head- lines.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.235.161