检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘杨 线岩团[1,2] 相艳 黄于欣[1,2] Liu Yang;Xian Yantuan;Xiang Yan;Huang Yuxin(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming 650500,China)
机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]云南省人工智能重点实验室,昆明650500
出 处:《计算机应用研究》2024年第8期2322-2328,共7页Application Research of Computers
基 金:国家自然科学基金资助项目(62266028);云南重大科技专项计划课题(202202AD080003)。
摘 要:实体漏标是目前远程监督命名实体识别(distantly supervised named entity recognition,DS-NER)存在的一个难点问题。训练集中的漏标实体在模型训练中提供了不正确的监督信息,模型将在后续预测实体类型时更倾向于将该类实体预测为非实体,导致模型的实体识别和分类能力下降,同时影响了模型的泛化性能。针对这一问题,提出了融合实体特征相似度计算负采样命名实体识别方法。首先,通过对候选样本和标注实体样本进行相似度计算并打分;其次,以相似度得分作为依据对候选样本进行采样,采样出参与训练的样本。与随机负采样方法相比,该方法通过结合相似度计算,降低了采样到漏标实体的可能性,进而提高了训练数据的质量,从而提升了模型的性能。实验结果表明,该方法在CoNLL03、Wiki、Twitter三个数据集上与其他模型相比,比基线模型平均取得了5%左右的F_(1)值提升,证明了该方法能够有效缓解远程监督条件下实体漏标带来的命名实体识别模型性能下降的问题。The entity omission is a typical problem of distantly supervised named entity recognition.Entity omission in the training set provides incorrect supervision information during model training,model will be more inclined to predict this type of entity as a non-entity when subsequently predicting entity types,resulting in a decline in the model’s entity recognition and classification capabilities,and affects the generalization performance of the model.To deal with the problem,this paper proposed a incorporating similarity negative sampling for distantly supervised named entity recognition.Firstly,it calculated and scored the similarity between the candidate samples and the labeled entity samples.Secondly,it sampled the candidate samples based on the similarity score,and sampled the samples participating in the training.Compared with the random negative sampling method,this method reduced the possibility of sampling missing entities by combining similarity calculations,thereby improving the quality of training data and thus improving the performance of the model.Experimental results show that compared with other models on the three data sets of CoNLL03,Wiki,and Twitter,compared with the baseline model,the proposed model achieved an average F_(1) value improvement of about 5 percentage points.It is proved that this method can effectively alleviate the problem of performance degradation of the named entity recognition model caused by missing entities under distantly supervised conditions.
关 键 词:命名实体识别 实体漏标 远程监督 负采样 数据增强
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.203.21