检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:段宇光 刘扬[1,3] 俞士汶[1,3] DUAN Yuguang;LIU Yang;YU Shiwen(Key Laboratory of Computational Linguistics(Ministry of Education),Peking University;Yuanpei College,Peking University;Institute of Computational Linguistics,Peking University,Beijing 100871,China)
机构地区:[1]北京大学计算语言学教育部重点实验室 [2]北京大学元培学院 [3]北京大学计算语言学研究所,北京100871
出 处:《厦门大学学报(自然科学版)》2018年第6期867-875,共9页Journal of Xiamen University:Natural Science
基 金:国家重点基础研究发展计划(973计划)(2014CB340504);国家社会科学基金重大项目(12&ZD119);国家社会科学基金(16BYY137)
摘 要:在自然语言处理中,嵌入表示是表达语言知识的重要途径和手段,以《同义词词林》为例,提出基于知识库训练嵌入表示的伪句式构造方法,并在多项任务上测试新方法的有效性.根据《同义词词林》词义编码反映的层级结构,将这些编码扩展为多种伪句式,并据此生成不同的伪语料库,采用word2vec模型在伪语料库上训练义素向量及词向量,得到CiLin2Vec资源,并应用于词义合成、类比推理和词义相似度计算等任务.在词义合成、类比推理任务上的准确率达到90%以上,超过了以往在语料库上训得的结果.证明该方法可以有效地将知识库中的理性知识注入嵌入表示中,也显示了CiLin2Vec嵌入表示资源在应用上的巨大潜力.In natural language processing(NLP),to learn embedded representation is an effective approach of capturing semantics from language resources.At present,however,this approach has been much limited to using large-scale corpora,with little attention to extracting rational knowledge from knowledge bases.In this paper,based on"Tongyici Cilin",a famous Chinese thesaurus,we present a method for implanting rational knowledge into embedded representation,then evaluate it in terms of different NLP tasks.According to the hierarchical encodings for morphemic and lexical meanings in"Tongyici Cilin",we design multiple templates to create instances as pseudo-sentences from these pieces of knowledge,and apply word2vec to obtain CiLin2Vec,the sememe and word embeddings of new kinds as for"Tongyici Cilin".For evaluation,tasks of semantic compositionality,analogical reasoning and word similarity measurement are taken into consideration.We make progress and breakthrough on the tasks,reaching an accuracy of over 90%for both semantic compositionality and analogical reasoning,demonstrating that the pieces of rational knowledge have been appropriately implanted,with very promising prospects for adoption of the knowledge bases.
关 键 词:《同义词词林》 嵌入表示 词义合成 类比推理 相似度
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30