检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨三和 赖沛超 傅仰耿[1] 王一蕾[1] 叶飞扬 张林[1] YANG Sanhe;LAI Peichao;FU Yanggeng;WANG Yilei;YE Feiyang;ZHANG Lin(College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China)
机构地区:[1]福州大学计算机与大数据学院,福州350108
出 处:《小型微型计算机系统》2025年第3期602-611,共10页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(12271098)资助;福建省自然科学基金项目(2022J01120)资助;福建省高校产学合作科技计划项目(2023H6008)资助.
摘 要:为解决中文小样本命名实体识别(NER)任务所面临的问题和挑战,提出了一种面向中文小样本NER的BERT优化方法,该方法包含两方面的优化:首先,针对训练样本数量不足限制了预训练语言模型BERT的语义感知能力的问题,提出了ProConBERT,一种基于提示学习与对比学习的BERT预训练策略.在提示学习阶段,设计掩码填充模板来训练BERT预测出每个标记对应的中文标签词.在对比学习阶段,利用引导模板训练BERT学习每个标记和标签词之间的相似性与差异性.其次,针对中文缺乏明确的词边界所带来的复杂性和挑战性,修改BERT模型的第一层Transformer结构,并设计了一种带有混合权重引导器的特征融合模块,将词典信息集成到BERT底层中.最后,实验结果验证了所提方法在中文小样本NER任务中的有效性与优越性.该方法结合BERT和条件随机场(CRF)结构,在4个采样的中文NER数据集上取得了最好的性能.特别是在Weibo数据集的3个小样本场景下,模型的F 1值分别达到了63.78%、66.27%、70.90%,与其他方法相比,平均F 1值分别提高了16.28%、14.30%、11.20%.此外,将ProConBERT应用到多个基于BERT的中文NER模型中能进一步提升实体识别的性能.To address the challenges and issues faced in the Chinese few-shot named entity recognition(NER)tasks,a BERT optimization approach tailored for Chinese few-shot NER is proposed.This approach encompasses two main optimizations:firstly,in view of the problem that the insufficient number of training samples limits the semantic perception ability of the pre-trained language model BERT,ProConBERT is proposed,a pre-training strategy for BERT based on prompt learning and contrastive learning.In the prompt learning phase,masked filling templates are designed to train BERT in predicting the corresponding Chinese label words for each token.In the contrastive learning phase,BERT is trained using guided templates to learn the similarity and dissimilarity between each token and label words.Secondly,in view of the complexity and challenges brought about by the lack of clear word boundaries in Chinese,the first-layer Transformer structure of BERT model is modified,and a feature fusion module with a hybrid weight guider is designed to integrate lexical information into the bottom level of BERT.Finally,the experimental results verify the effectiveness and superiority of the proposed method in the Chinese few-shot NER tasks.The method combines BERT and conditional random field(CRF)structure,and achieves the best performance on four sampling Chinese NER datasets.Particularly in the three few-shot scenarios of the Weibo dataset,the model′s F 1 scores reach 63.78%,66.27%,and 70.90%,respectively.Compared to other methods,the average F 1 scores are improved by 16.28%,14.30%,and 11.20%.Furthermore,applying ProConBERT to multiple BERT-based Chinese NER models can further enhances entity recognition performance.
关 键 词:中文小样本命名实体识别 提示学习 对比学习 预训练 特征融合 BERT模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7