检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:拓雨欣 薛涛[1] TUO Yuxin;XUE Tao(School of Computer Science,Xi’an Polytechnic University,Xi’an Shaanxi 710600,China)
机构地区:[1]西安工程大学计算机科学学院,西安710600
出 处:《计算机应用》2023年第7期2116-2124,共9页journal of Computer Applications
基 金:陕西省技术创新引导计划项目(2020CGXNG-012)。
摘 要:针对自然语言文本中实体重叠情况复杂、多个关系三元组提取困难的问题,提出一种融合指针网络与关系嵌入的三元组联合抽取模型。首先利用BERT(Bidirectional Encoder Representations from Transformers)预训练模型对输入句子进行编码表示;然后利用首尾指针标注抽取句子中的所有主体,并采用主体和关系引导的注意力机制来区分不同关系标签对每个单词的重要程度,从而将关系标签信息加入句子嵌入中;最后针对主体及每一种关系利用指针标注和级联结构抽取出相应的客体,并生成关系三元组。在纽约时报(NYT)和网络自然文本生成(WebNLG)两个数据集上进行了大量实验,结果表明,所提模型相较于目前最优的级联二元标记框架(CasRel)模型,整体性能分别提升了1.9和0.7个百分点;与基于跨度的提取标记方法(ETL-Span)模型相比,在含有1~5个三元组的对比实验中分别取得了大于6.0%和大于3.7%的性能提升,特别是在含有5个以上三元组的复杂句子中,所提模型的F1值分别提升了8.5和1.3个百分点,且在捕获更多实体对的同时能够保持稳定的提取能力,进一步验证了该模型在三元组重叠问题中的有效性。Aiming at the problems of complex entity overlap situations and difficulties in extracting multiple relational triples in natural language texts,a joint triple extraction model combining pointer network and relational embedding was proposed.Firstly,the BERT(Bidirectional Encoder Representations from Transformers)pre-training model was used to encode and represent the input sentence.Secondly,the head and tail pointer labeling was used to extract all subjects in the sentence,and the attention mechanism guided by subjects and relations was used to distinguish the importance of different relation labels to each word,so that the relation label information was added to the sentence embedding.Finally,for the subjects and each relation,the corresponding object was extracted by using the pointer labeling and cascade structure,and the relational triples were generated.Extensive experiments were conducted on two datasets,New York Times(NYT)and Web Natural Language Generation(WebNLG),and the results show that the proposed model has better overall performance than the current best Novel Cascade Binary Tagging Framework(CasRel)model by 1.9 and 0.7 percentage points respectively;compared with the Extract-Then-Label method with Span-based scheme(ETL-Span)model,the performance improvements of the proposed model are more than 6.0% and more than 3.7% in the comparison experiments with 1 to 5 triples,respectively.Especially in complex sentences with more than 5 triples,the proposed model has the F1 score improved by 8.5 and 1.3 percentage points respectively.And stable extraction ability of this model is maintained while capturing more entity pairs,which further verifies the effectiveness of this model in triple overlap problem.
关 键 词:信息提取 重叠关系 三元组提取 BERT 注意力机制 深度学习
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171