改进RoBERTa、多实例学习和双重注意力机制的关系抽取方法  

A relation extraction method based on improved RoBERTa,multiple-instance learning and dual attention mechanism

在线阅读下载全文

作  者:王禹鸥 苑迎春[1,2] 何振学 王克俭[1] WANG Yuou;YUAN Yingchun;HE Zhenxue;WANG Kejian(College of Information Science and Technology,Hebei Agricultural University,Baoding 071001,Hebei,China;Hebei Province Key Laboratory of Agricultural Big Data,Hebei Agricultural University,Baoding 071001,Hebei,China)

机构地区:[1]河北农业大学信息科学与技术学院,河北保定071001 [2]河北省农业大数据重点实验室(河北农业大学),河北保定071001

出  处:《山东大学学报(工学版)》2025年第2期78-87,共10页Journal of Shandong University(Engineering Science)

基  金:国家自然科学基金资助项目(62102130)。

摘  要:针对远程监督关系抽取不能充分利用句子上下文高层信息、易带来噪声标注的问题,提出一种基于改进鲁棒优化的双向编码器表征预训练模型(robustly optimized bidirectional encoder representations from Transformers pretraining approach,RoBERTa)、多实例学习(multiple-instance learning,MI)和双重注意力(dual attention,DA)机制的关系抽取方法。在RoBERTa中引入全词动态掩码,获取文本上下文信息,获得词级别语义向量;将特征向量输入双向门控循环单元(bidirectional gated recurrent unit,BiGRU),挖掘文本深层次语义表征;引入多实例学习,通过学习实例级别特征缩小关系抽取类别范围;引入双重注意力机制,结合词语级注意力机制和句子级注意力机制的优势,充分捕捉句子中实体词语特征信息和对有效语句的关注度,增强句子表达能力。试验结果表明,在公开数据集纽约时报(New York Times,NYT)数据集和谷歌IISc远程监督(Google IISc distant supervision,GIDS)数据集中,关系抽取方法的F1值分别为88.63%、90.13%,均优于主流对比方法,能够有效降低远程监督噪声影响,实现关系抽取,为构建知识图谱提供理论基础。Aiming at the problem that distant supervision relation extraction could not make full use of the high-level information of sentence context and was easy to bring noise annotations,a relation extraction method based on improved robustly optimized bidirectional encoder representations from Transformers pretraining approach(RoBERTa),multiple-instance learning(MI)and dual attention(DA)mechanism was proposed.The full-word dynamic mask was introduced on the RoBERTa to obtain the text context information and the word-level semantic vector.The feature vectors were input into bidirectional gated recurrent unit(BiGRU)to mine the deep semantic representation of the text.Multiple-instance learning was introduced to narrow the range of relation extraction categories by learning instance-level features.Dual attention mechanism was introduced,which combined the advantages of word-level attention mechanism and sentence-level attention mechanism to fully capture the feature information of entity words in the sentence,improved the model′s attention of effective sentences,and enhanced the expression ability of sentences.The experimental results showed that the F1 value of the method reached 88.63%and 90.13%on the public dataset New York Times(NYT)and Google IISc distant supervision(GIDS),which were better than the mainstream comparison methods.It could effectively reduce the noise influence of distant supervision,realize the relation extraction,and lay a theoretical foundation for the construction of knowledge graph.

关 键 词:远程监督 关系抽取 改进RoBERTa 多实例学习 双重注意力机制 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象