融合相似度负采样的远程监督命名实体识别方法被引量：1

Incorporating similarity negative sampling for distantly supervised NER

作　　者：刘杨线岩团[1,2] 相艳黄于欣[1,2] Liu Yang;Xian Yantuan;Xiang Yan;Huang Yuxin(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]云南省人工智能重点实验室,昆明650500

出　　处：《计算机应用研究》2024年第8期2322-2328,共7页Application Research of Computers

基　　金：国家自然科学基金资助项目(62266028);云南重大科技专项计划课题(202202AD080003)。

摘　　要：实体漏标是目前远程监督命名实体识别(distantly supervised named entity recognition,DS-NER)存在的一个难点问题。训练集中的漏标实体在模型训练中提供了不正确的监督信息,模型将在后续预测实体类型时更倾向于将该类实体预测为非实体,导致模型的实体识别和分类能力下降,同时影响了模型的泛化性能。针对这一问题,提出了融合实体特征相似度计算负采样命名实体识别方法。首先,通过对候选样本和标注实体样本进行相似度计算并打分;其次,以相似度得分作为依据对候选样本进行采样,采样出参与训练的样本。与随机负采样方法相比,该方法通过结合相似度计算,降低了采样到漏标实体的可能性,进而提高了训练数据的质量,从而提升了模型的性能。实验结果表明,该方法在CoNLL03、Wiki、Twitter三个数据集上与其他模型相比,比基线模型平均取得了5%左右的F_(1)值提升,证明了该方法能够有效缓解远程监督条件下实体漏标带来的命名实体识别模型性能下降的问题。The entity omission is a typical problem of distantly supervised named entity recognition.Entity omission in the training set provides incorrect supervision information during model training,model will be more inclined to predict this type of entity as a non-entity when subsequently predicting entity types,resulting in a decline in the model’s entity recognition and classification capabilities,and affects the generalization performance of the model.To deal with the problem,this paper proposed a incorporating similarity negative sampling for distantly supervised named entity recognition.Firstly,it calculated and scored the similarity between the candidate samples and the labeled entity samples.Secondly,it sampled the candidate samples based on the similarity score,and sampled the samples participating in the training.Compared with the random negative sampling method,this method reduced the possibility of sampling missing entities by combining similarity calculations,thereby improving the quality of training data and thus improving the performance of the model.Experimental results show that compared with other models on the three data sets of CoNLL03,Wiki,and Twitter,compared with the baseline model,the proposed model achieved an average F_(1) value improvement of about 5 percentage points.It is proved that this method can effectively alleviate the problem of performance degradation of the named entity recognition model caused by missing entities under distantly supervised conditions.

关键词：命名实体识别实体漏标远程监督负采样数据增强

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合相似度负采样的远程监督命名实体识别方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合相似度负采样的远程监督命名实体识别方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

融合相似度负采样的远程监督命名实体识别方法被引量：1