基于证据增强与多特征融合的文档级关系抽取

Document-level relationship extraction based on evidence enhancement and multi-feature fusion

作　　者：颜新月杨淑群高永彬 YAN Xinyue;YANG Shuqun;GAO Yongbin(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)

机构地区：[1]上海工程技术大学电子电气工程学院,上海201620

出　　处：《计算机应用》2024年第11期3379-3385,共7页journal of Computer Applications

基　　金：上海市地方能力建设项目(21010501500);上海市“科技创新行动计划”社会发展科技攻关项目(21DZ1204900)。

摘　　要：文档级关系抽取(DocRE)的目的是识别文档中实体对之间存在的所有关系。针对证据句子和文档信息未能被有效利用以及实体多提及的问题,在使用证据增强上下文特征的基础上,构建一种多特征融合的文档级关系抽取模型EMF(Evidence Multi-feature Fusion)。首先,在实体前后加上实体类型,将关系文本特征与实体提及进行关联,以获得特定于关系的实体特征。其次,通过不同卷积核获得片段表示,并通过注意力机制获得实体对感知的多粒度片段级特征;同时,利用证据分布增强与实体对高度相关的上下文特征。最后,融合以上特征进行关系分类,并在推理时将获得的证据组成伪文档与原文档一起输入分类器进行关系分类。在DocRE数据集DocRED(Document-level Relation Extraction Dataset)上的实验结果表明,使用BERTbase作为预训练语言模型编码器时,相较于先进模型EIDER(EvIDence-Enhanced DocRE),所提模型EMF的Ign F1和F1分别提高了0.42和0.41个百分点,F1达到了62.89%。EMF模型更关注与实体和关系相关的部分,可提高抽取的精度,并具有较好的可解释性。Document-level Relationship Extraction(DocRE)aims at identifying all the relationships that exist between entity pairs in a document.Aiming at the problems of ineffective use of evidence sentences as well as document information,and multiple mentions of entities,a multi-feature fusion DocRE model named EMF(Evidence Multi-feature Fusion)was constructed based on evidence-enhanced contextual features.Firstly,entity types were added before and after entities,and relationship text features were associated with entity mentions to obtain relationship-specific entity features.Secondly,fragment representations were obtained through different convolutional kernels,and multi-granularity fragment-level features perceived by entity pairs were obtained through the attention mechanism.Meanwhile,contextual features highly correlated with the entity pairs were enhanced by using evidence distribution.Finally,the above features were fused for relationship classification,and during inference,the obtained evidence was composed into a pseudo-document and input into the classifier together with the original document for relationship classification.Experimental results on DocRED(Document-level Relation Extraction Dataset),a DocRE dataset,show that when using BERTbase as the PLM encoder,compared with the state-of-the-art model EIDER(EvIDence-Enhanced DocRE),the EMF model has the Ign F1 and F1 improved by 0.42 and 0.41 percentage points respectively,and the F1 reached 62.89%.It can be seen that the EMF model pays more attention to the parts that are related to entities and relationships,improves the extraction accuracy,and has a good interpretability.

关键词：文档级关系抽取证据提及注意片段特征

分类号：TP309.2[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于证据增强与多特征融合的文档级关系抽取

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于证据增强与多特征融合的文档级关系抽取

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索