机构地区:[1]华中农业大学信息学院,武汉430070 [2]湖北省农业大数据工程技术研究中心(华中农业大学),武汉430070 [3]华中科技大学网络安全学院,武汉430074 [4]华中科技大学武汉国家光电研究中心,武汉430074
出 处:《农业工程学报》2022年第8期263-270,共8页Transactions of the Chinese Society of Agricultural Engineering
基 金:国家重点研发计划项目(2018YFC1604005);中央高校基本科研业务费专项资金资助(2662019PY070,2662022JC004,2662022XXYJ001)。
摘 要:为了提高食品安全领域关系抽取的效率和准确性,该研究在收集食品安全领域语料的基础上,对语料中相应的实体和关系进行标注,构建可用于食品安全领域关系抽取的专业数据集。同时,提出面向食品安全领域的基于BERT-PCNN-ATT-Jieba的关系抽取模型,该模型使用基于转换器的双向编码器表征量(Bidirectional Encoder Representations from Transformers,BERT)预训练模型生成输入词向量,并结合分段卷积神经网络(Piecewise Convolutional Neural Network,PCNN)模型的分段最大池化层能极大程度捕获句子局部信息的特点,在分段最大池化层与分类层之间添加了注意力机制,以进一步提取高层语义。此外,考虑中文语料的特性,在BERT模型进行随机掩码切分之前,采用Jieba分词技术对中文语料进行分词,PCNN模型在执行掩码语言模型(Masked Language Model,MLM)时以词为单位进行掩码,使得输入到训练模型中的句子尽可能减少语义损失,以实现更高效的关系抽取。在该研究构建的数据集基础上,将BERT-PCNN-ATT-Jieba模型与经典的卷积神经网络(Convolutional Neural Network,CNN)、PCNN模型、以及结合BERT的CNN、PCNN、PCNN-ATT、PCNN-Jieba等6个模型进行比较,该研究提出的BERT-PCNN-ATT-Jieba模型取得更优的性能,其准确率达到84.72%,召回率达到81.78%,F值达到83.22%。该模型为食品安全领域的知识抽取提供参考,为该领域知识图谱的自动化构建节约了成本,同时为基于该领域知识图谱的知识问答、知识检索、数据共享及食品安全智慧监管等应用提供依据。A knowledge graph(semantic network)has emerged to organize the real-world entities in a graph database for the relationship between them.Among them,relationship extraction has been one of the most important links in the automatic construction of knowledge graphs.However,there is no public dataset related to knowledge graphs in the food safety field at present.The existing models of relationship extraction are confined to the open standard data set,but most cannot extract the data in the specific domain.In this study,a professional data set was constructed for the relationship extraction in the food safety field using the Bidirectional Encoder Representations from Transformers(BERT)and the improved Piecewise Convolutional Neural Network(PCNN)model.The corpus was firstly collected to annotate the corresponding entities and related categories.At the same time,a relationship extraction model was proposed using BERT-PCNN-Attention-based Neural Networks(ATT)-Jieba for the field of food safety.The BERT pre-training model was selected to generate the input word vector.After that,the segmented maximum pooling layer of the PCNN model was utilized to capture the local information of sentences.An attention mechanism was added between the segmented maximum pooling layer and the classification layer,further to extract the high-level semantics.In addition,Jieba word segmentation was used to segment the Chinese corpus before the random mask segmentation of the BERT model.The segmented maximum pool layer of the PCNN model masked the word unit instead of characters when executing the Masked Language Model(MLM).As such,the semantic loss of sentences was reduced to achieve a more efficient relationship extraction,when inputting into the training model.The performance of the BERT-PCNN-ATT-Jieba model was compared with the classical CNN,PCNN model,as well as the CNN,PCNN,PCNN-ATT,and PCNN-Jieba models combined with BERT under the same data set and the consistent experimental parameters.Comparing the PCNN with the BERT-PCNN model,the p
关 键 词:食品安全 模型 关系抽取 知识图谱 注意力机制 BERT PCNN
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...