检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王彤 张立杰 王铭[5] 吴华瑞[3] 朱华吉[3] 杨英茹[4] 王春山[1,3] WANG Tong;ZHANG Lijie;WANG Ming;WU Huarui;ZHU Huaji;YANG Yingru;WANG Chunshan(College of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;College of Mechanical and Electrical Engineering,Hebei Agricultural University,Baoding 071001,China;National Engineering Research Center for Information Technology in Agriculture,Beijing 100097,China;Shijiazhuang Academy of Agriculture and Forestry Sciences,Shijiazhuang 050041,China;Hebei Education Examinations Authority,Shijiazhuang 050091,China)
机构地区:[1]河北农业大学信息科学与技术学院,河北保定071001 [2]河北农业大学机电工程学院,河北保定071001 [3]国家农业信息化工程技术研究中心,北京100097 [4]石家庄市农林科学研究院,河北石家庄050041 [5]河北省教育考试院,河北石家庄050091
出 处:《河北农业大学学报》2024年第3期113-120,129,共9页Journal of Hebei Agricultural University
基 金:河北省自然基金项目(F2022204004);国家大宗蔬菜产业技术体系项目(CARS-23-D07);国家重点研发计划项目(2020YFD1100204).
摘 要:针对实体和关系抽取过程中存在的一词多义、实体嵌套、三元组重叠的问题,本文提出了1种融合RoBERTa-WWM和全局指针网络的联合抽取模型RBGPL。该模型引入RoBERTa-WWM预训练模型,利用上下文的语境信息融合克服了不同语境下一词多义问题;采用全局指针网络Global pointer标注方式解决了实体嵌套问题;通过全局指针联合解码模型将三重抽取转变为五重提取,解决了三元组重叠问题。在自建农业病害数据集上,模型RBGPL的精确率、召回率、F1值达到76.23%,91.18%,83.04%,与其他联合抽取模型相对比F1值均取最优,有效地克服了一词多义问题和三元组重叠问题。此外,在病原(Pathogeny)和作物名称(Crop)2种易嵌套实体的F1值上提升了3%和18%,实体嵌套得到了显著缓解。本文方法提高了中文农业病害领域实体关系抽取性能,可为农业病害领域知识图谱的构建提供技术支持。Aiming at the problems of polysemy,entity nesting,and triple overlap existing in the process of entity and relation extraction,this paper proposesd a joint extraction model RBGPL that integrates RoBERTa-WWM and Global Pointer network.Firstly,the RoBERTA WWM pre-training model is introduced to overcome the problem of polysemy in different contexts by using context information fusion.Secondly,the global pointer network Global Pointer annotation method was used to solve the problem of entity nesting.Finally,the triple extraction is transformed into the quintuple extraction through the global pointer joint decoding model,which solves the problem of triple overlap.When ran on the self built agricultural disease data set,the accuracy,recall and F1 values of the model RBGPL reached 76.23%,91.18%and 83.04%,which were the best compared with other joint extraction models,and effectively overcame the problem of polysemy and triple overlap.In addition,F1 values of pathogen and crop easily nested entities increased by 3%and 18%,and entity nesting was significantly alleviated.This method improved the performance of Chinese agricultural disease domain entity relationship extraction,and can provide technical support for the construction of agricultural disease domain knowledge map.
关 键 词:农业病害 联合抽取 RoBERTa-WWM Global pointer
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7