基于双仿射注意力的迭代式开放域信息抽取  

Iterative open information extraction based on biaffine attention

在线阅读下载全文

作  者:李欣[1,2] 邵靖淇 王昊 何丽[1,2] 段建勇 Li Xin;Shao Jingqi;Wang Hao;He Li;Duan Jianyong(School of Information Science&Technology,North China University of Technology,Beijing 100144,China;CNONIX National Standard Application&Promotion Lab,Beijing 100144,China)

机构地区:[1]北方工业大学信息学院,北京100144 [2]CNONIX国家标准应用与推广实验室,北京100144

出  处:《计算机应用研究》2024年第7期2046-2051,共6页Application Research of Computers

基  金:国家重点研发计划资助项目(2020AAA0109700);国家自然科学基金资助项目(62076167,61972003);北京市教委研发计划资助项目(KM202210009002);北方工业大学北京城市治理研究基地项目(2023CSZL16)。

摘  要:当前的开放域信息抽取(OpenIE)方法无法同时兼顾抽取结果的紧凑性和模型的性能,导致其抽取结果不能更好地被应用到下游任务中。为此,提出一个基于双仿射注意力进行表格填充及迭代抽取的模型。首先,该模型通过双仿射注意力学习单词之间的方向信息、捕获单词对之间的相互作用,随后对二维表格进行填充,使句子中的成分相互共享并识别紧凑成分;其次,使用多头注意力机制将谓词和参数的表示应用于上下文的嵌入中,使谓词和参数的提取相互依赖,更好地链接关系成分和参数成分;最后,对于含有多个关系成分的句子,使用迭代抽取的方式在无须重新编码的情况下捕获每次提取之间固有的依赖关系。在公开数据集CaRB和Wire57上的实验表明,该方法比基线方法实现了更高的精度和召回率,F_(1)值提升了至少1.4%和3.2%,同时产生了更短、语义更丰富的提取。The current OpenIE methods cannot take into account the compactness of the extraction results and the performance of the model at the same time,which makes the extraction results unable to be better applied to downstream tasks.Therefore,this paper proposed a model that used biaffine attention for table filling and iterative extraction.Firstly,the model learned the directional information between words through biaffine attention,captured the interaction between word pairs,and then filled the two-dimensional table to make the components in the sentence share each other and identify compact components.Secondly,it used the multi-head attention mechanism to apply the representation of predicates and parameters to the context embedding,making the extraction of predicates and parameters dependent on each other and better linking the relationship components and parameter components.Finally,for sentences containing multiple relational components,it used iterative extraction to capture the inherent dependencies between each extraction without recoding.Experiments on the public datasets CaRB and Wire57 show that this method achieves higher precision and recall than baseline methods,improving F_(1) values by at least 1.4%and 3.2%,while producing shorter and semantically richer extractions.

关 键 词:开放域信息抽取 双仿射注意力 紧凑性 多头注意力 迭代抽取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象