基于深度学习和指代消解的中文人名识别  被引量:2

Research On Chinese Name Recognition Based on Deep Learning and Coreference Resolution

在线阅读下载全文

作  者:陈雨 玄宇航 张玉志 CHEN Yu;XUAN Yuhang;ZHANG Yuzhi(School of Software,Nankai University,Tianjin 300450,China)

机构地区:[1]南开大学,软件学院,天津300450

出  处:《数据与计算发展前沿》2022年第2期63-73,共11页Frontiers of Data & Computing

基  金:国家重点研发计划(2021YFB0300104)。

摘  要:【目的】命名实体识别是自然语言处理领域的一项基本任务,实体包括人名、地名和组织名等,与其他实体相比,人名与职务、职务变更及人称代词有关。人名的实体识别中,人名语料的残缺及人称指代不明等问题,成为处理中的难点、痛点。基于此观察,本文提出一种融合指代消解的序列标注方法来改进人名识别,这可以有效缓解人名识别中人名语料不完善的问题,并且可以解决人称代词指代不明、人力耗费量大等问题。【方法】具体地,首先利用职务变更进行数据增强,可以有效解决实际应用中标注数据不足的问题。接着为了更好地学习上下文特征,本文使用语言预训练模型BERT和双向长短时记忆网络结合的方式,并利用条件随机场建模来标签序列的关系。最后,针对文本中的人称代词,加入指代消解算法,进一步改进人名识别。【结果】在公共数据集和本文提出的数据集上的实验结果均表明本文提出方法的有效性。[Objective]Named entity recognition is a basic task in the field of natural language processing.Entities include person names,place names,and organization names.Compared with other entities,person names are related to job titles,job changes,and personal pronouns.In the entity recognition of personal names,the incompleteness of the personal name corpus and the unclear personal designation have become difficulties and pain points in processing.Based on this observation,this paper proposes a sequence tagging method that integrates denotation resolution to improve name recognition,which can effectively alleviate the problem of incomplete name corpus in name recognition,and can solve the problems of unclear personal pronouns and high labor consumption.[Methods]Specifically,using job change to enhance data can effectively solve the problem of insufficient labeled data in practical applications.Then,to better learn contextual features,this approach uses the combination of language pre-training model BERT and bidirectional long-term memory network and uses conditional random field modeling to label the relationship of sequences.Finally,for the personal pronouns in the text,a coreference resolution algorithm is added to further improve name recognition.[Results]The experiment results on both public datasets and the datasets proposed in this paper demonstrate the effectiveness of the proposed method.

关 键 词:命名实体识别 指代消解 BERT 长短时记忆网络 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象