基于词语关系的词向量模型  被引量:11

Word Representation Based on Word Relations

在线阅读下载全文

作  者:蒋振超 李丽双[1] 黄德根[1] 

机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116024

出  处:《中文信息学报》2017年第3期25-31,共7页Journal of Chinese Information Processing

基  金:国家自然科学基金(61672126;61173101)

摘  要:词向量能够以向量的形式表示词的意义,近来许多自然语言处理应用中已经融入词向量,将其作为额外特征或者直接输入以提升系统性能。然而,目前的词向量训练模型大多基于浅层的文本信息,没有充分挖掘深层的依存关系。词的词义体现在该词与其他词产生的关系中,而词语关系包含关联单位、关系类型和关系方向三个属性,因此,该文提出了一种新的基于神经网络的词向量训练模型,它具有三个顶层,分别对应关系的三个属性,更合理地利用词语关系对词向量进行训练,借助大规模未标记文本,利用依存关系和上下文关系来训练词向量。将训练得到的词向量在类比任务和蛋白质关系抽取任务上进行评价,以验证关系模型的有效性。实验表明,与skipgram模型和CBOW模型相比,由关系模型训练得到的词向量能够更准确地表达词语的语义信息。In natural language processing tasks, distributed word representation has succeeded in capturing semantic regularities and have been used as extra features. However, most word representation model are based shallow context window, which are not enough to express the meaning of words. The essence of word meaning lies in the word relations, which consist of three elements: relation type, relation direction and related items. In this paper, we leverage a large set of unlabeled texts, to make explicit the semantic regularity to emerge in word relations, including dependency relations and context relations, and put forward a novel architecture for computing continuous vector representation. We define three different top layers in the neural network architecture as corresponding to relation type, relation direction and related words, respectively. Different from other models, the relation model can use the deep syntactic information to train word representations. Tested in word analogy task and Protein-Protein Interaction Extraction task, the results show that relation model performs overall better than others to capture semantic regularities.

关 键 词:词表示 词嵌入 词向量 神经网络 关系模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象