基于负训练和迁移学习的关系抽取方法  被引量:2

Relation extraction method based on negative training and transfer learning

在线阅读下载全文

作  者:陈克正 郭晓然[3] 钟勇[1,2] 李振平 CHEN Kezheng;GUO Xiaoran;ZHONG Yong;LI Zhenping(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610213,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China;School of Mathematics and Computer Science,Northwest Minzu University,Lanzhou Gansu 730124,China)

机构地区:[1]中国科学院成都计算机应用研究所,成都610213 [2]中国科学院大学计算机科学与技术学院,北京100049 [3]西北民族大学数学与计算机科学学院,兰州730124

出  处:《计算机应用》2023年第8期2426-2430,共5页journal of Computer Applications

基  金:四川省科技成果转移转化平台项目(2020ZHCG0002);中央高校基本科研业务费(青年教师创新)项目(31920210090)。

摘  要:远程监督是关系抽取任务中常用的数据自动标注方法,然而该方法会引入大量的噪声数据,从而影响模型的表现效果。为了解决噪声数据的问题,提出一种基于负训练和迁移学习的关系抽取方法。首先通过负训练的方法训练一个噪声数据识别模型;然后根据样本的预测概率值对噪声数据进行过滤和重新标注;最后利用迁移学习的方法解决远程监督存在的域偏移问题,从而进一步提升模型预测的精确率和召回率。以唐卡文化为基础,构建了具有民族特色的关系抽取数据集。实验结果表明,所提方法的F1值达到91.67%,相较于SENT(Sentence level distant relation Extraction via Negative Training)方法,提升了3.95个百分点,并且远高于基于BERT(Bidirectional Encoder RepresentationsfromTransformers)、BiLSTM+ATT(Bi-directionalLongShort-TermMemoryAndAttention)、PCNN(Piecewise Convolutional Neural Network)的关系抽取方法。In relation extraction tasks,distant supervision is a common method for automatic data labeling.However,this method will introduce a large amount of noisy data,which affects the performance of the model.In order to solve the problem of noisy data,a relation extraction method based on negative training and transfer learning was proposed.Firstly,a noisy data recognition model was trained through negative training method.Then,the noisy data were filtered and relabeled according to the predicted probability value of the sample,Finally,a transfer learning method was used to solve the domain shift problem existing in distant supervision tasks,and the precision and recall of the model were further improved.Based on Thangka culture,a relation extraction dataset with national characteristics was constructed.Experimental results show that the F1 score of the proposed method reaches 91.67%,which is 3.95 percentage points higher than that of SENT(Sentence level distant relation Extraction via Negative Training) method,and is much higher than those of the relation extraction methods based on BERT(Bidirectional Encoder Representations from Transformers),BiLSTM+ATT(Bi-directional Long Short-Term Memory and Attention),and PCNN(Piecewise Convolutional Neural Network).

关 键 词:远程监督 负训练 知识图谱 关系抽取 迁移学习 自然语言处理 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象