融合多粒度膨胀卷积和扩展边界的中文命名实体识别算法  

Chinese Named Entity Recognition Algorithm Based on Multi-Particle Dilatational Convolution and Extended Boundary

在线阅读下载全文

作  者:侯向丹 孙晓凯 刘洪普 HOU Xiangdan;SUN Xiaokai;LIU Hongpu(School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Computing,Tianjin 300401,China)

机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401 [2]河北省大数据计算重点实验室,天津300401

出  处:《计算机工程与应用》2025年第7期196-203,共8页Computer Engineering and Applications

基  金:河北省自然科学基金(F2021202038)。

摘  要:命名实体识别在自然语言处理领域扮演着基础且关键的角色,它是信息提取、问答系统、机器翻译等众多常见应用的核心组成元素。其识别成果将直接决定后续多个下游自然语言处理任务的表现,因此具有极大的研究价值。但现有的对中文命名实体识别的研究主要集中在规律性实体上,对于非规律性实体和多表现形式实体的识别效果有待提升,如各种名称。为了解决这一问题,在基于规则启发模型(regularity-inspired recognition network,RICON)的基础上,提出了一种“实体的扩展规则网络”(extension regular of entity network,ERoEN),通过加入边界扩展机制捕获每个扩展后实体的规律,从而提取非规律性实体前后的隐含特征,以此提高对非规律性实体的识别能力;并且引入多粒度膨胀卷积层捕获不同距离的上下文信息,增强实体与上下文的关联性,最终提高对多表现形式实体的识别效果。通过在OntoNote4、Weibo和CLUENER2020数据集上的验证,ERoEN模型相比于RICON模型大部分实验中在F1值上分别提升了1.27、1.02和1.13个百分点。Named entity recognition plays a fundamental and critical role in the field of natural language processing,and it’s the core component of many common applications such as information extraction,question answering system,machine translation,etc.Its recognition results will directly determine the performance of many subsequent downstream natural language processing tasks,so it has great research value.However,the existing research on the recognition of Chinese named entities mainly focuses on regular entities,and the recognition effect of irregular entities and multi-representation entities needs to be improved,such as various names.In order to solve this problem,on the basis of the regularity-inspired recognition network(RICON),an extension regular of entity network(ERoEN)is proposed.It captures the law of each extended entity by adding a boundary extension mechanism,so as to extract the hidden features before and after irregular entities,improve the recognition ability of irregular entities.In addition,a multi-granularity dilatational convolution layer is introduced to capture context information at different distances,enhance the correlation between entities and contexts,and finally improve the recognition effect of multi-representation entities.Through verification on OntoNote4,Weibo and CLUENER2020 data sets,F1 value of ERoEN model increases by 1.27,1.02 and 1.13 percentage points respectively compared with most experiments of RICON model.

关 键 词:命名实体识别 注意力机制 膨胀卷积 边界扩展 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象