检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:段建勇[1,2] 朱奕霏 王昊 何丽[1,2] 李欣 DUAN Jianyong;ZHU Yifei;WANG Hao;HE Li;LI Xin(School of Information,North China University of Technology,Beijing 100144,China;CNONIX National Standard Application and Promotion Laboratory,Beijing 100144,China)
机构地区:[1]北方工业大学信息学院,北京100144 [2]CNONIX国家标准应用与推广实验室,北京100144
出 处:《计算机工程》2023年第12期71-77,共7页Computer Engineering
基 金:国家自然科学基金(61972003);教育部人文社科基金(21YJA740052);北京市教育委员会科学研究计划项目(KM202210009002)。
摘 要:针对传统中文嵌套命名实体识别模型通常存在实体边界难以准确定位及中文字符与词汇之间边界模糊的问题,构建一种基于位置嵌入和多级结果边界预测的嵌套命名实体识别模型。在嵌入层,将嵌套实体位置信息与文本位置信息同时编码后生成绝对位置序列,通过关注中文文本中自带的位置信息,进一步挖掘嵌套实体与字符之间的关系,并且增强了嵌套实体与原始文本之间的联系。在编码层,利用排除最优路径的隐藏矩阵实现嵌套实体的初步识别。在解码层,计算实体边界的偏移量,重新确定实体边界,从而提高中文嵌套实体识别准确率。实验结果表明,在医疗和日常两个领域的数据集上,该模型的准确率、召回率、F1值相比于基线模型中的最优值分别提高了0.34、1.06、0.80和11.90、0.78、6.23个百分点,具有较好的识别性能。Traditional Chinese nested Named Entity Recognition(NER)models often face problems,such as difficulty in accurately locating entity boundaries and blurred boundaries between Chinese characters and vocabulary.A nested NER model based on position embedding and multilevel result boundary prediction is proposed to address this problem.The position information of nested entities is encoded with the text position information in the embedding layer.An absolute position sequence is then generated,which further examines the relationship between the nested entities and characters and strengthens the connection between the nested entities and the original text by focusing on the position information in the Chinese text.At the encoding layer,the nested entities are initially identified using a hidden matrix that excludes the best path with multilevel prediction.At the decoding layer,the offset of entity boundaries is calculated at the multilevel prediction layer to redefine the entity boundaries,and improve the accuracy of Chinese entity prediction.The experimental results show that the proposed model improves the precision,recall,and F1-value by 0.34,1.06,and 0.80 percentage points,respectively,on the medical domain dataset,and by 11.90,0.78,and 6.23 percentage points,respectively,on the daily domain dataset compared to the highest value in the baseline models.This study demonstrates that the proposed model exhibits high performance in recognizing Chinese nested named entities.
关 键 词:嵌套命名实体识别 位置嵌入 边界预测单元 条件随机场 多级预测
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222