基于BERT-BiGRU模型的文本分类研究  被引量:12

Research on text classification based on BERT-BiGRU model

在线阅读下载全文

作  者:王紫音 于青[1,2] WANG Ziyin;YU Qing(Tianjin Key Laboratory of Intelligent Computing and Network Security,Tianjin University of Technology,Tianjin 300384,China;School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China)

机构地区:[1]天津理工大学天津市智能计算与网络安全重点实验室,天津300384 [2]天津理工大学计算机科学与工程学院,天津300384

出  处:《天津理工大学学报》2021年第4期40-46,共7页Journal of Tianjin University of Technology

基  金:国家自然科学基金(71501141)。

摘  要:文本分类是自然语言处理的典型应用,目前文本分类最常用的是深度学习的分类方法。针对中文文本数据具有多种特性,例如隐喻表达、语义多义性、语法特异性等,在文本分类中进行研究。提出基于编码器-解码器的双向编码表示法-双向门控制循环单元(bidirectional encoder representations from transformers-bidirectional gate recurrent unit,BERT-BiGRU)模型结构,使用BERT模型代替传统的Word2vec模型表示词向量,根据上下文信息计算字的表示,在融合上下文信息的同时还能根据字的多义性进行调整,增强了字的语义表示。在BERT模型后面增加了BiGRU,将训练后的词向量作为Bi GRU的输入进行训练,该模型可以同时从两个方向对文本信息进行特征提取,使模型具有更好的文本表示信息能力,达到更精确的文本分类效果。使用提出的BERT-BiGRU模型进行文本分类,最终准确率达到0.93,召回率达到0.94,综合评价数值F1达到0.93。通过与其他模型的试验结果对比,发现BERT-BiGRU模型在中文文本分类任务中有良好的性能。Text classification is a typical application of natural language processing. At present, the most commonly used text classification method is deep learning. This paper studies the text classification of Chinese text data with various characteris-tics, such as metaphor expression, semantic polysemicity and grammatical specificity, and proposes the structure of bidirectional encoder representations from transformers-bidirectional gate recurrent unit(BERT-BiGRU) model. The BERT model is used instead of the traditional word2 vec model to represent the word vector, and the word representation is calculated according to the context information. It can be adjusted according to the polysemation of the word while the context information is fused, thus enhancing the semantic representation of the word. BiGRU is added to the BERT model to train the trained word vector as the input of Bi GRU. This model can extract the text information features from both directions at the same time, so that the model can have better text representation information and achieve more accurate text classification effect. Setting up multiple sets of model compared with the proposed model test, using the proposed BERT-BiGRU model for text classification, the final accuracy reached 0.93, the recall rate reached 0.94, and F1 value reached 0.93. Comparing with other model, test results show that BERT-Bi GRU model has good performance in the Chinese text classification task.

关 键 词:文本分类 深度学习 基于编码器-解码器的双向编码表示法(bidirectional encoder representations from transformers BERT)模型 双向门控制循环单元(bidirectional gate recurrent unit BiGRU) 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象