结合自训练模型的命名实体识别方法

Named entity recognition method combined with self-training model

作　　者：肖伟郑更生[1,2] 陈钰佳 XIAO Wei;ZHENG Gengsheng;CHEN Yujia(School of Computer Science&Engineering Artificial Intelligence,Wuhan Institute of Technology,Wuhan 430205,Hubei,China;Hubei Key Laboratory of Intelligent Robot,Wuhan 430205,Hubei,China)

机构地区：[1]武汉工程大学计算机科学与工程学院、人工智能学院,湖北武汉430205 [2]智能机器人湖北省重点实验室,湖北武汉430205

出　　处：《山东大学学报（工学版）》2024年第2期96-102,共7页Journal of Shandong University（Engineering Science）

基　　金：国家自然科学基金青年基金项目(62106179)。

摘　　要：针对命名实体识别数据集中存在某些实体类别样本过少,使模型学习该类别特征能力较差,导致整体性能较低的问题,提出结合自训练模型的命名实体识别方法。利用已有的命名实体识别数据集训练一个教师模型,通过改进的文本相似度函数搜寻与原数据集最相似的无标签文本,利用教师模型对无标签文本生成伪标签,并将伪标签与有标签数据集混合重新训练一个学生模型用于下游的命名实体识别任务。试验结果表明,相较基线模型,该方法在公共数据集MSRA、CONLL03和法律实体识别数据集上取得更优的性能。Aiming to address the issue of insufficient samples for certain entity categories in the named entity recognition dataset,which hampered the model's ability to learn the category's features and resulted in lower overall performance,this study proposed a named entity recognition method that incorporated a self-training model.A teacher model was trained using the available named entity recognition dataset.The improved text similarity function was used to search for unlabeled text that was most similar to the original dataset.The teacher model was utilized to generate pseudo-labels for the unlabeled text.These pseudo-labels were then combined with the labeled dataset to retrain a student model for the downstream named entity recognition task.The experimental results showed that,compared with the baseline model,the method achieved even better performance on the public datasets MSRA,CONLLO3,and the legal entity recognition dataset.

关键词：命名实体识别自训练文本相似度自然语言处理少样本

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合自训练模型的命名实体识别方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合自训练模型的命名实体识别方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索