检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:乔勇鹏 于亚新 刘树越 王子腾 夏子芳 乔佳琪 Qiao Yongpeng;Yu Yaxin;Liu Shuyue;Wang Ziteng;Xia Zifang;Qiao Jiaqi(School of Computer Science and Engineering,Northeastern University,Shenyang 110169;Key Laboratory of Intelligent Computing in Medical Image(Northeastern University),Ministry of Education,Shenyang 110169)
机构地区:[1]东北大学计算机科学与工程学院,沈阳110169 [2]医学影像智能计算教育部重点实验室(东北大学),沈阳110169
出 处:《计算机研究与发展》2023年第1期153-166,共14页Journal of Computer Research and Development
基 金:国家自然科学基金项目(61871106,61973059);国家重点研发计划项目(2016YFC0101500)。
摘 要:从无结构化自然语言文本中抽取实体关系三元组是构建大型知识图谱中最为关键的一步,但现有研究仍存在3方面问题:1)忽略文本中因多个三元组共享同一实体而产生的实体关系重叠问题;2)当前以编码器-解码器为基础的联合抽取模型未充分考虑文本语句词之间的依赖关系;3)部分三元组序列过长导致误差累积与传播,影响实体关系抽取的精度和效率.基于此,提出基于图卷积增强多路解码的实体关系联合抽取模型(graph convolution-enhanced multi-channel decoding joint entity and relation extraction model,GMCD-JERE).首先,基于BiLSTM作为模型编码器,强化文本中词的双向特征融合;其次,通过图卷积多跳特征融合句中词之间的依赖关系,提高关系抽取准确性;此外,改进传统模型按三元组先后顺序的解码机制,通过多路解码三元组机制,解决实体关系重叠问题,同时缓解三元组序列过长造成误差累积、传播的影响;最后,实验选用当前3个主流模型进行性能验证,在NYT(New York times)数据集上结果表明在精确率、召回率和F1这3个指标上分别提升了4.3%,5.1%,4.8%,同时在WebNLG(Web natural language generation)数据集上验证以关系为开始的抽取顺序.Extracting relational triplets from unstructured natural language texts are the most critical step in building a large-scale knowledge graph,but existing researches still have the following problems:1)Existing models ignore the problem of relation overlapping caused by multiple triplets sharing the same entity in text;2)The current joint extraction model based on encoder-decoder does not fully consider the dependency relationship among words in the text;3)The excessively long sequence of triplets leads to the accumulation and propagation of errors,which affects the precision and efficiency of relation extraction in entity.Based on this,a graph convolution-enhanced multi-channel decoding joint entity and relation extraction model(GMCD-JERE)is proposed.First,the BiLSTM is introduced as a model encoder to strengthen the two-way feature fusion of words in the text;second,the dependency relationship between the words in the sentence is merged through the graph convolution multi-hop mechanism to improve the accuracy of relation classification;third,through multi-channel decoding mechanism,the model solves the problem of relation overlapping,and alleviates the effect of error accumulation and propagation at the same time;fourth,the experiment selects the current three mainstream models for performance verification,and the results on the NYT(New York times)dataset show that the accuracy rate,recall rate,and F1 are increased by 4.3%,5.1%and 4.8%.Also,the extraction order starting with the relation is verified in the WebNLG(Web natural language generation)dataset.
关 键 词:关系抽取 编码器–解码器 多路解码 关系重叠 图卷积神经网络
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.132.108