检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王禹鸥 苑迎春[1,2] 何振学 王克俭[1] WANG Yuou;YUAN Yingchun;HE Zhenxue;WANG Kejian(College of Information Science and Technology,Hebei Agricultural University,Baoding 071001,Hebei,China;Hebei Province Key Laboratory of Agricultural Big Data,Hebei Agricultural University,Baoding 071001,Hebei,China)
机构地区:[1]河北农业大学信息科学与技术学院,河北保定071001 [2]河北省农业大数据重点实验室(河北农业大学),河北保定071001
出 处:《山东大学学报(工学版)》2025年第2期78-87,共10页Journal of Shandong University(Engineering Science)
基 金:国家自然科学基金资助项目(62102130)。
摘 要:针对远程监督关系抽取不能充分利用句子上下文高层信息、易带来噪声标注的问题,提出一种基于改进鲁棒优化的双向编码器表征预训练模型(robustly optimized bidirectional encoder representations from Transformers pretraining approach,RoBERTa)、多实例学习(multiple-instance learning,MI)和双重注意力(dual attention,DA)机制的关系抽取方法。在RoBERTa中引入全词动态掩码,获取文本上下文信息,获得词级别语义向量;将特征向量输入双向门控循环单元(bidirectional gated recurrent unit,BiGRU),挖掘文本深层次语义表征;引入多实例学习,通过学习实例级别特征缩小关系抽取类别范围;引入双重注意力机制,结合词语级注意力机制和句子级注意力机制的优势,充分捕捉句子中实体词语特征信息和对有效语句的关注度,增强句子表达能力。试验结果表明,在公开数据集纽约时报(New York Times,NYT)数据集和谷歌IISc远程监督(Google IISc distant supervision,GIDS)数据集中,关系抽取方法的F1值分别为88.63%、90.13%,均优于主流对比方法,能够有效降低远程监督噪声影响,实现关系抽取,为构建知识图谱提供理论基础。Aiming at the problem that distant supervision relation extraction could not make full use of the high-level information of sentence context and was easy to bring noise annotations,a relation extraction method based on improved robustly optimized bidirectional encoder representations from Transformers pretraining approach(RoBERTa),multiple-instance learning(MI)and dual attention(DA)mechanism was proposed.The full-word dynamic mask was introduced on the RoBERTa to obtain the text context information and the word-level semantic vector.The feature vectors were input into bidirectional gated recurrent unit(BiGRU)to mine the deep semantic representation of the text.Multiple-instance learning was introduced to narrow the range of relation extraction categories by learning instance-level features.Dual attention mechanism was introduced,which combined the advantages of word-level attention mechanism and sentence-level attention mechanism to fully capture the feature information of entity words in the sentence,improved the model′s attention of effective sentences,and enhanced the expression ability of sentences.The experimental results showed that the F1 value of the method reached 88.63%and 90.13%on the public dataset New York Times(NYT)and Google IISc distant supervision(GIDS),which were better than the mainstream comparison methods.It could effectively reduce the noise influence of distant supervision,realize the relation extraction,and lay a theoretical foundation for the construction of knowledge graph.
关 键 词:远程监督 关系抽取 改进RoBERTa 多实例学习 双重注意力机制
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171