基于多种分词情况的中文命名实体识别  

Chinese named entity recognition based on multiple word segmentation cases

在线阅读下载全文

作  者:田地 邵玉斌[1] 杜庆治[1] 龙华[1] 马迪南 TIAN Di;SHAO Yu-bin;DU Qing-zhi;LONG Hua;MA Di-nan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Key Laboratory of Media Integration of Yunnan Province,Kunming 650032,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]云南省媒体融合重点实验室,昆明650032

出  处:《兰州大学学报(自然科学版)》2024年第3期350-356,共7页Journal of Lanzhou University(Natural Sciences)

基  金:云南省媒体融合重点实验室项目(320225403)。

摘  要:针对中文词语边界不明确,词语和句子上下文关系被忽略的问题,设计一种基于多种分词情况的歧义分词信息抑制算法.在预处理中根据预训练的词汇频率表计算语句中不同分词的权重,将最有可能的分词情况与其他分词情况进行区分,合并至语句中,在自注意力机制提取语句上下文信息时加入分词权重信息,添加正确分词有效的边界信息,抑制歧义分词错误的前后文关系.对比MarkBert与W2NER算法,在公开数据集Resume、 MSRA、 Weibo、 OntoNotes中的试验结果表明,歧义分词信息抑制算法的预测准确率、句子长度增加时的鲁棒性、数据集增大时的预测准确率均有更好的表现.Aiming at the problem of unclear sentence vocabulary boundaries and neglected vocabulary and context relationship training,an ambiguous word segmentation information suppression algorithm based on multiple word segmentation situations was designed.The weights of different subwords of the utterance were calculated in the computation based on the pre-trained timing frequency table,the most likely subword cases were distinguished from other subword cases and merged into the utterance,and finally the information of subword weights was added in the independent variable mechanism to extract the contextual information of the utterance;the goal of adding the valid boundary information of the correct subword and the purpose of regulating the symmetric contextual relationship for ambiguous subword errorsr were achieved.A comparison between the MarkBert and W2NER algorithms was made and experiments on the public data sets such as Resume,MSRA,Weibo and OntoNotes showed that the algorithm improved the prediction accuracy and robustness when the sentence length increased,and increased the prediction accuracy when the data set increased.

关 键 词:命名实体识别 预训练模型 自注意力 词边界信息 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象