基于位置嵌入和多级预测的中文嵌套命名实体识别被引量：1

Chinese Nested Named Entity Recognition Based on Location Embedding and Multilevel Prediction

作　　者：段建勇[1,2] 朱奕霏王昊何丽[1,2] 李欣 DUAN Jianyong;ZHU Yifei;WANG Hao;HE Li;LI Xin(School of Information,North China University of Technology,Beijing 100144,China;CNONIX National Standard Application and Promotion Laboratory,Beijing 100144,China)

机构地区：[1]北方工业大学信息学院,北京100144 [2]CNONIX国家标准应用与推广实验室,北京100144

出　　处：《计算机工程》2023年第12期71-77,共7页Computer Engineering

基　　金：国家自然科学基金(61972003);教育部人文社科基金(21YJA740052);北京市教育委员会科学研究计划项目(KM202210009002)。

摘　　要：针对传统中文嵌套命名实体识别模型通常存在实体边界难以准确定位及中文字符与词汇之间边界模糊的问题,构建一种基于位置嵌入和多级结果边界预测的嵌套命名实体识别模型。在嵌入层,将嵌套实体位置信息与文本位置信息同时编码后生成绝对位置序列,通过关注中文文本中自带的位置信息,进一步挖掘嵌套实体与字符之间的关系,并且增强了嵌套实体与原始文本之间的联系。在编码层,利用排除最优路径的隐藏矩阵实现嵌套实体的初步识别。在解码层,计算实体边界的偏移量,重新确定实体边界,从而提高中文嵌套实体识别准确率。实验结果表明,在医疗和日常两个领域的数据集上,该模型的准确率、召回率、F1值相比于基线模型中的最优值分别提高了0.34、1.06、0.80和11.90、0.78、6.23个百分点,具有较好的识别性能。Traditional Chinese nested Named Entity Recognition(NER)models often face problems,such as difficulty in accurately locating entity boundaries and blurred boundaries between Chinese characters and vocabulary.A nested NER model based on position embedding and multilevel result boundary prediction is proposed to address this problem.The position information of nested entities is encoded with the text position information in the embedding layer.An absolute position sequence is then generated,which further examines the relationship between the nested entities and characters and strengthens the connection between the nested entities and the original text by focusing on the position information in the Chinese text.At the encoding layer,the nested entities are initially identified using a hidden matrix that excludes the best path with multilevel prediction.At the decoding layer,the offset of entity boundaries is calculated at the multilevel prediction layer to redefine the entity boundaries,and improve the accuracy of Chinese entity prediction.The experimental results show that the proposed model improves the precision,recall,and F1-value by 0.34,1.06,and 0.80 percentage points,respectively,on the medical domain dataset,and by 11.90,0.78,and 6.23 percentage points,respectively,on the daily domain dataset compared to the highest value in the baseline models.This study demonstrates that the proposed model exhibits high performance in recognizing Chinese nested named entities.

关键词：嵌套命名实体识别位置嵌入边界预测单元条件随机场多级预测

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置嵌入和多级预测的中文嵌套命名实体识别被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置嵌入和多级预测的中文嵌套命名实体识别 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于位置嵌入和多级预测的中文嵌套命名实体识别被引量：1