检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈克正 钟勇[1,2] CHEN Kezheng;ZHONG Yong(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学计算机科学与技术学院,北京100049
出 处:《计算机应用》2022年第S02期42-46,共5页journal of Computer Applications
基 金:四川省科技成果转移转化平台项目(2020ZHCG0002)。
摘 要:远程监督关系抽取可以在非人工标注条件下自动构建数据集,但同时会产生错误标注。针对错误标注问题,提出一种基于实体注意力和负训练的远程监督噪声过滤方法。首先使用BERT预训练语言模型或双向长短期记忆(BiLSTM)神经网络提取句子和实体的特征,然后通过计算实体和句子中每个词之间的相关性,作为头实体和尾实体的注意力权重,接着使用负训练的方式准确捕获噪声数据的关键特征,最后通过基于预测值的动态阈值函数过滤噪声数据,并根据样本的最大预测值对过滤出的噪声数据重新进行正确标注。在人工智能领域数据集上进行实验:所提方法在使用BERT预训练语言模型提取句子和实体特征时,相比SENTBERT,F1值获得了2.23个百分点的提升;当使用BiLSTM提取句子和实体特征时,相比SENTBiLSTM,F1值获得了2.53个百分点的提升。实验结果验证了所提方法能更有效地过滤远程监督产生的噪声数据。Distant supervision for relation extraction can automatically construct data sets under the condition of non manual annotation,but it produces wrong annotation at the same time.In order to solve the problem of wrong annotation,a distant supervision noise filtering method based on entity attention and negative training was proposed.Firstly,The features of the sentences and entities were extracted by using the BERT(Bidirectional Encoder Representations from Transformers pre-training language model)or Bi-directional Long Short-Term Memory(BiLSTM).Then,the correlations between the entities and each word in the sentence were calculated,as the attention weights of the head entity and the tail entity.Then,the negative training method was used to accurately capture the key features of the noisy data.Finally,these noisy data were filtered through a dynamic threshold function based on the predicted value noisy data,and then these noisy data were correctly reannotated according to the maximum predicted value of the sample.Experiments were carried out on the dataset of artificial intelligence.When the features of the sentences and entities were extracted by using BERT,the F1 score was improved by 2.23 percentage points compared with SENTBERT(Sentence level distant relation Extraction via Negative Training based Bidirectional Encoder Representations from Transformers pre-training language model).When the features of the sentences and entities were extracted by using BiLSTM,the F1 score was improved by 2.53 percentage points compared with SENTBiLSTM(Sentence level distant relation Extraction via Negative Training based Bi-directional Long Short-Term Memory).The experimental results verify that the proposed method can filter the noisy data generated by distant supervision more effectively.
关 键 词:远程监督 负训练 注意力机制 关系抽取 动态阈值 知识图谱
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200