基于深度学习的实体关系联合抽取研究综述  被引量:16

Joint Extraction of Entities and Relations Based on Deep Learning:A Survey

在线阅读下载全文

作  者:张仰森[1] 刘帅康 刘洋[2] 任乐 辛永辉 ZHANG Yang-sen;LIU Shuai-kang;LIU Yang;REN Le;XIN Yong-hui(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100192,China;Computer Network Emergency Response Technical Team,Coordination Center of China,Beijing 100029,China)

机构地区:[1]北京信息科技大学智能信息处理研究所,北京100192 [2]国家计算机网络应急技术处理协调中心,北京100029

出  处:《电子学报》2023年第4期1093-1116,共24页Acta Electronica Sinica

基  金:国家自然科学基金(No.62176023)。

摘  要:实体关系抽取是信息抽取领域的核心任务.从文本中抽取的实体关系三元组是构建大规模知识图谱的基础.传统的流水线方法将实体关系抽取分解为独立的命名实体识别和关系抽取两个子任务.首先,构建一个高效的命名实体识别器,从大规模非结构化文本语句中识别实体边界和类型.然后,将该命名实体识别器识别的实体与类型作为关系抽取任务中所用数据的标注.最后,通过关系抽取器得到两个实体之间的关系类别,进而组合成为结构化的实体关系三元组.命名实体识别任务存在的误差会影响后续的关系抽取任务的性能,这使得流水线方法具有错误累积问题.这是因为关系抽取任务中使用的标注数据来自于前面的命名实体识别任务,这会有一定的误差,进而影响关系抽取的结果质量.此外,流水线方法减弱了两个子任务之间的特征关联,这会出现冗余实体的问题.命名实体识别任务和关系抽取任务独立进行学习训练,导致这两个子任务间缺乏交互,使得文本信息没有得到充分利用,限制了流水线方法的性能瓶颈.由于非结构化文本信息没有得到充分利用,流水线方法在抽取实体间长依赖关系时具有一定局限性,很难达到联合抽取模型的性能指标.实际应用中,实体间往往存在多种关系,流水线方法无法充分使用全局文本信息,且命名实体识别会产生冗余实体,在抽取多元重叠关系时,该方法具有一定的局限性.因此,在构建高准确率实体关系抽取模型时,流水线方法具有欠缺之处.本文对实体关系联合抽取的研究发展全景进行了综述,简要阐明整数线性规划、卡片金字塔解析模型、概率图模型和结构化预测模型这四类基于特征工程的联合模型的共同缺点.本文聚焦基于深度学习的实体关系联合抽取技术,根据近年来实体关系联合抽取前沿研究成果,总结了实体关系联合抽取模�Entity-relation extraction is a core task in the field of information extraction.Entity-relation triples extract-ed from text are the basis for building large-scale knowledge graphs.The traditional pipeline method decomposes entity-re-lation extraction into two subtasks:named entity recognition and relation extraction.First,an efficient named entity recog-nizer is built to identify the entity boundaries and types from large-scale unstructured text sentences.Then,the entities and types are used as labels for the data used in the relation extraction task.Finally,the relationship category between two enti-ties is obtained through the relationship extractor and then combined into a structured entity-relation triplet.However,error in the named entity recognition task will affect the performance of the subsequent relation extraction task,which makes the pipeline method problematic because of error accumulation.This is because the labeled data used in the relation extraction task come from the previous named entity recognition task,which will include certain errors,and this will affect the quality of the relation extraction results.In addition,the pipeline method weakens the feature association between the two subtasks,which will lead to redundant entities.The named entity recognition task and relationship extraction task are independently learned and trained,which leads to a lack of interaction between these two subtasks.As a result,the text information is not fully utilized,which becomes the main reason the performance of the pipeline method is limited.Because unstructured text information is not fully employed,the pipeline method has certain limitations in extracting long dependencies between enti-ties,and it is difficult to achieve high performance in the joint extraction model.In practical applications,there are often multiple relationships between entities,but the pipeline method cannot fully consider the global text information,and hence named entity recognition produces redundant entities,which has disadvantages

关 键 词:信息抽取 知识图谱 深度学习 实体关系联合抽取 流水线方法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象