机构地区:[1]南京农业大学人工智能学院,江苏南京210095 [2]南京农业大学国家信息农业工程技术中心,江苏南京210095
出 处:《南京农业大学学报》2020年第6期1151-1161,共11页Journal of Nanjing Agricultural University
基 金:国家重点研发计划项目(2016YFD0300607);江苏省研究生培养创新工程项目(SJCX18_0198)。
摘 要:[目的]从水稻病虫草害防治文本中,自动抽取病虫草害与药剂之间的实体与关系,为构建作物系统领域知识图谱提供数据。[方法]针对病虫草害防治文本中含有大量实体没有明确边界以及药剂与病虫草害实体之间存在多种类型关系的特点,设计了一种基于新标注模式的双层长短期记忆(bi-directional long short-term memory,BiLSTM)网络与注意力机制结合的水稻病虫草害与药剂的实体关系联合抽取算法(joint entity recognition and relation extraction for rice diseases,pests and weeds,JE-DPW)。该方法在解码层利用BiLSTM网络的前向传播和反向传播,增强对病虫草害防治文本中复杂语义特征的提取;再通过softmax分类器获取字符的类别标签,实现实体识别;与此同时,利用注意力机制判断当前字符与之前字符之间存在的关联关系,实现实体与多关系的联合抽取。[结果]利用包含7380个实体、8605个关系的病虫草害防治文本数据集训练模型,使用测试集测试后发现:JE-DPW算法在病虫草害与药剂的实体抽取和关系分类任务中的准确率分别为91.3%和76.8%,对无边界实体识别的准确率为88.1%。与BiLSTM实现实体抽取方法相比,准确率高出8.1%。与利用循环神经网络(recurrent neural network,RNN)和长短期记忆网络(long short-term memory,LSTM)实现关系分类的方法相比,准确率分别高出22.6%和19.7%;随着关系数量的增加,JE-DPW算法在关系抽取上的F1值可保持17.4%~20.1%的优势。[结论]本文提出的算法可以有效提升水稻病虫草害防治文本中实体关系联合抽取的准确度,提高作物系统领域知识库的构建速度。[Objectives]From the documents on the control of diseases,pests and weeds for rice,the entities and relationships between diseases pests weeds and drugs were automatically extracted to provide an important data for the construction of knowledge maps in the field of crop systems.[Methods]Aiming at the characteristics that the documents contain a large number of entities without clear boundaries and multitype relationships between entities of drugs and diseases,pests and weeds,a joint entity recognition and relation extraction algorithm has been designed based on double BiLSTM(bi-directional long short-term memory)combined with attention mechanisms using the new annotation pattern.The algorithm name is a joint entity recognition and relation extraction for rice diseases,pests and weeds,referred to as JE-DPW(joint entity recognition and relation extraction for rice diseases,pestsss and weeds).This algorithm used the forward and backward propagation of the BiLSTM network at the decoding layer,which enhanced the extraction of complex semantic features in the diseases,pests and weeds control text.The Softmax classifier was used to obtain the category labels of characters to achieve entity recognition,and the attention mechanism was also used to determine the existing relationship between the current character and the previous character to realize the joint extraction of entities and multiple relationships.[Results]The model was trained using a disease,pest and weed and drugs data set containing 7380 entities and 8605 relationships,and it was found that the average accuracy of the JE-DPW algorithm in the entity extraction and relationship classification tasks reached respectively 91.3%and 76.8%,and the average accuracy rate of borderless entity recognition reached 88.1%.Compared with the BiLSTM implementation of the entity extraction method,the average accuracy rate was 8.1%higher.Compared with the algorithm using RNN(recurrent neural network)and LSTM(long short-term memory)to achieve the relationship classification res
关 键 词:病虫草害 实体关系抽取 长短期记忆网络 注意力机制
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...