基于约束图的远程监督长尾关系抽取方法  

Distantly-supervised long-tailed relation extraction based on constraint graph

在线阅读下载全文

作  者:张万里 佟安 李文桥 ZHANG Wanli;TONG An;LI Wenqiao(Unit 93209 of PLA,Beijing 100085,China;Computer School,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]中国人民解放军93209部队,北京100085 [2]北京信息科技大学计算机学院,北京100101

出  处:《现代电子技术》2024年第21期91-96,共6页Modern Electronics Technique

摘  要:关系抽取任务可以从非结构化文本中抽取出实体对的关系信息,是信息抽取的核心任务。远程监督可以通过自动构建训练数据的方式降低人工的成本和压力,但原始语料本身存在数据不平衡的现象,导致长尾分布问题。针对这一问题,基于多示例学习的思想,提出一种基于约束图的远程监督长尾关系抽取方法。首先根据知识图谱本体结构构建约束图,利用图卷积神经网络对其进行编码;其次利用分段膨胀卷积神经网络和实体注意力机制对句子进行编码;最后结合上述编码信息进行分类预测。在公开数据集NYT10上,相较于主流最优模型在Hits@10、Hits@15和Hits@20上分别提高了约0.6%、1.5%和2.6%,证明了实体类型和关系之间的约束信息对远程监督长尾关系抽取的重要性。In the relation extraction task,the relationship information of entity pairs can be extracted from unstructured text.The relation extraction task is the core task of information extraction.Remote supervision can reduce labor costs and pressure by constructing training data automatically.However,the data imbalance occurs to the original corpus itself,which leads to the long-tailed distribution.In view of this,a distantly-supervised long-tailed relation extraction method on the basis of constraint graph is proposed based on the idea of multiple instance learning.A constraint graph is constructed based on the ontology structure of the knowledge graph,and then the constraint graph is encoded by a graph convolutional network(GCN).The sentences are encoded with segmented dilation CNN and entity attention mechanism.Classification prediction are implemented based on the above coded information.On the public dataset NYT10,the Hits@10,Hits@15 and Hits@20 of the proposed model are improved by approximately 0.6%,1.5%and 2.6%,respectively,in comparison with those of the mainstream optimal models.It is proved that the constraint information between entity types and relations is important for distantly-supervised long-tailed relation extraction.

关 键 词:关系抽取 远程监督 长尾分布 约束图 深度学习 知识图谱 注意力机制 膨胀卷积 

分 类 号:TN911-34[电子电信—通信与信息系统] TP391.1[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象