机构地区:[1]北京林业大学信息学院,北京100083 [2]国家林业草原林业智能信息处理工程技术研究中心,北京100083
出 处:《东北农业大学学报》2022年第5期20-31,共12页Journal of Northeast Agricultural University
基 金:国家自然科学基金项目(32071775,61702038)。
摘 要:采用文本分类方法对梅花中文语料按研究方向(基因、育种、非生物胁迫等)进行分类,是构建梅花知识图谱的重要预处理过程,也是对基于以上研究方向的梅花研究信息进行相关语义检索、智能问答等的重要基础。为探究文本分类方法应用于梅花研究信息的可行性,提出基于改进ERNIE-RCNN的梅花研究信息文本分类方法。针对缺乏梅花研究信息数据集导致常用文本分类方法分类效果不佳的问题,构建包含6个研究方向的中文梅花研究信息文本数据集;针对传统分类模型的编码机制难以体现文本逻辑性、语义还原不精确的问题,引入预训练模型ERNIE对文本进行编码,在编码过程中增强对文本特征提取与语义表示的能力;为更好保留文本词序及特征,提高分类正确性,在ERNIE模型编码基础上融合TextRCNN模型进行分类,改进TextRCNN模型卷积层丢弃率,增强分类模型泛化性,提高分类能力。通过对改进ERNIE-RCNN模型与仅改进ERNIE的ERNIE-RCNN模型、原始ERNIE-RCNN模型、ERNIE模型、BERT模型以及TextRCNN模型进行对比,试验结果表明,改进ERNIE-RCNN模型在不同评价指标上均高于其他模型,精准率、召回率和F1值分别不小于91.53%、90.27%、92.35%,正确率为95.35%。基于改进ERNIE-RCNN的梅花研究信息文本分类方法可满足实际需要。Classifying information by research directions(genes,breeding,abiotic stresses,etc.)is an important pre-processing process for building a knowledge graph of plum blossoms,as well as an important basis for developing applications,such as semantic retrieval,intelligent question and answer for plum blossom research information,based on the above research directions.In order to explore the feasibility of applying text classification methods to plum blossom research information,an improved ERNIE-RCNN text classification method,based on plum blossom research information was pro-posed.The main process was shown as the followings:Firstly,to address the problem that the lack of plum blossom research information dataset led to the poor classification effect of common text classification methods,a Chinese plum blossom research information text dataset containing six research directions was constructed.Secondly,to address the problems that the encoding mechanism of the traditional classification model could hardly capture the text logic and the semantic reduction was not accurate,the pre-training model ERNIE was introduced to encode the text which would enhance the ability of text feature extraction and semantic representation in the encoding process.Finally,in order to better preserve text word sequence and features and improve classification accuracy and correctness,a fused TextRCNN model was added to the ERNIE model for classification.The convolutional layer Dropout rate of the TextRCNN model was improved to enhance the generalization of the classification model and improve the classification ability.The experiments were designed to compare the improved ERNIE-RCNN model with the ERNIE-RCNN model with only improved ERNIE,the original ERNIE-RCNN model,the ERNIE model,the BERT model and TextRCNN model.The experimental results showed that the improved ERNIE-RCNN model was higher than other models in different categories of evaluation criteria,with Precision,Recall and F1 value no less than 91.53%,90.27% and 92.35%,and the acc
关 键 词:梅花 研究信息 文本分类 ERNIE TextRCNN 深度学习
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...