面向中文小样本命名实体识别的BERT优化方法

Optimization Method of BERT for Chinese Few-shot Named Entity Recognition

作　　者：杨三和赖沛超傅仰耿[1] 王一蕾[1] 叶飞扬张林[1] YANG Sanhe;LAI Peichao;FU Yanggeng;WANG Yilei;YE Feiyang;ZHANG Lin(College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China)

机构地区：[1]福州大学计算机与大数据学院,福州350108

出　　处：《小型微型计算机系统》2025年第3期602-611,共10页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(12271098)资助;福建省自然科学基金项目(2022J01120)资助;福建省高校产学合作科技计划项目(2023H6008)资助.

摘　　要：为解决中文小样本命名实体识别(NER)任务所面临的问题和挑战,提出了一种面向中文小样本NER的BERT优化方法,该方法包含两方面的优化:首先,针对训练样本数量不足限制了预训练语言模型BERT的语义感知能力的问题,提出了ProConBERT,一种基于提示学习与对比学习的BERT预训练策略.在提示学习阶段,设计掩码填充模板来训练BERT预测出每个标记对应的中文标签词.在对比学习阶段,利用引导模板训练BERT学习每个标记和标签词之间的相似性与差异性.其次,针对中文缺乏明确的词边界所带来的复杂性和挑战性,修改BERT模型的第一层Transformer结构,并设计了一种带有混合权重引导器的特征融合模块,将词典信息集成到BERT底层中.最后,实验结果验证了所提方法在中文小样本NER任务中的有效性与优越性.该方法结合BERT和条件随机场(CRF)结构,在4个采样的中文NER数据集上取得了最好的性能.特别是在Weibo数据集的3个小样本场景下,模型的F 1值分别达到了63.78%、66.27%、70.90%,与其他方法相比,平均F 1值分别提高了16.28%、14.30%、11.20%.此外,将ProConBERT应用到多个基于BERT的中文NER模型中能进一步提升实体识别的性能.To address the challenges and issues faced in the Chinese few-shot named entity recognition(NER)tasks,a BERT optimization approach tailored for Chinese few-shot NER is proposed.This approach encompasses two main optimizations:firstly,in view of the problem that the insufficient number of training samples limits the semantic perception ability of the pre-trained language model BERT,ProConBERT is proposed,a pre-training strategy for BERT based on prompt learning and contrastive learning.In the prompt learning phase,masked filling templates are designed to train BERT in predicting the corresponding Chinese label words for each token.In the contrastive learning phase,BERT is trained using guided templates to learn the similarity and dissimilarity between each token and label words.Secondly,in view of the complexity and challenges brought about by the lack of clear word boundaries in Chinese,the first-layer Transformer structure of BERT model is modified,and a feature fusion module with a hybrid weight guider is designed to integrate lexical information into the bottom level of BERT.Finally,the experimental results verify the effectiveness and superiority of the proposed method in the Chinese few-shot NER tasks.The method combines BERT and conditional random field(CRF)structure,and achieves the best performance on four sampling Chinese NER datasets.Particularly in the three few-shot scenarios of the Weibo dataset,the model′s F 1 scores reach 63.78%,66.27%,and 70.90%,respectively.Compared to other methods,the average F 1 scores are improved by 16.28%,14.30%,and 11.20%.Furthermore,applying ProConBERT to multiple BERT-based Chinese NER models can further enhances entity recognition performance.

关键词：中文小样本命名实体识别提示学习对比学习预训练特征融合 BERT模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向中文小样本命名实体识别的BERT优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向中文小样本命名实体识别的BERT优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索