检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曾鹏武 谢志鹏[1] ZENG Pengwu;XIE Zhipeng(School of Computer Science,Fudan University,Shanghai 200438,China)
机构地区:[1]复旦大学计算机科学技术学院,上海200438
出 处:《小型微型计算机系统》2025年第3期513-519,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(62076072)资助.
摘 要:现有的命名实体识别方法需要大量训练数据,在小样本场景下容易过拟合.针对该问题,提出一种基于预训练模型掩码预测的方法,引入上下文信息改善模型泛化能力.首先使用两路BERT(Bidirectional Encoder Representation from Transformers)预训练模型,通过掩码替换计算当前词汇的词元表征和上下文表征;随后计算两者在BERT词表上的概率向量,使用自适应门控机制加权求和获得融合概率.若预测概率最大的词为本文预定义的类别标签词,则将当前词元分类为实体,否则分类为非实体.在来自不同领域的CoNLL03、OntoNotes5.0以及MIT-Movie数据集上的实验结果表明,所提算法的平均F1值相较于基准方法提升了12%,相较于提示词方法提升了4%~11%,有效改善了小样本下的泛化性能,证明引入上下文信息的有效性.Existing named entity recognition methods need a large amount of training data and are prone to overfitting in few-shot setting.To address this problem,a method based on masked language model task is proposed,which introduces contextual information to improve generalization performance.First,a two-way BERT(Bidirectional Encoder Representation from Transformers)pre-training model is used to compute the word representation and contextual representation of the word respectively through masking replacement.Then,the classification probabilistic vectors of the two are calculated based on the BERT vocabulary.The adaptive gating mechanism weights the probabilistic vector to obtain the fusion probability.If the word with the highest classification probability is the label-word predefined in this paper,it is classified as an entity,otherwise it is a non-entity.Based on CoNLL03,OntoNotes 5.0 and MIT-Movie from different domains,experimental results show that the average F1 value of the proposed method is improved by 12%compared with the baseline method,and is improved by 4%to 11%compared with the prompt methods.The proposed method effectively improves the generalization performance in few-shot setting and proves the effectiveness of introducing contextual information.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.192.254