融合语义信息的矩阵分解词向量学习模型  被引量:1

Word representation learning model using matrix factorization to incorporate semantic information

在线阅读下载全文

作  者:陈培 景丽萍[1] 

机构地区:[1]北京交通大学交通数据分析与挖掘北京市重点实验室,北京100044

出  处:《智能系统学报》2017年第5期661-667,共7页CAAI Transactions on Intelligent Systems

基  金:国家自然科学基金项目(61370129;61375062;61632004);长江学者和创新团队发展计划资助项目(IRT201206)

摘  要:词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库(包含丰富的词语义信息),文中提出一种融合语义信息的词向量学习方法(KbEMF),该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。Word representation plays an important role in natural language processing and has attracted a great deal of attention from many researchers due to its simplicity and effectiveness. However,traditional methods for learning word representations generally rely on a large amount of unlabeled training data,while neglecting the semantic information of words,such as the semantic relationship between words. To sufficiently utilize knowledge bases that contain rich semantic word information in existing fields,in this paper,we propose a word representation learning method that incorporates semantic information( KbEMF). In this method,we use matrix factorization to incorporate field knowledge constraint items into a learning word representation model,which identifies words with strong semantic relationships as being relatively approximate to the obtained word representations. The results of word analogy reasoning tasks and word similarity measurement tasks obtained using actual data show the performance of KbEMF to be superior to that of existing models.

关 键 词:自然语言处理 词向量 矩阵分解 语义信息 知识库 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象