检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王紫音 于青[1,2] WANG Ziyin;YU Qing(Tianjin Key Laboratory of Intelligent Computing and Network Security,Tianjin University of Technology,Tianjin 300384,China;School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China)
机构地区:[1]天津理工大学天津市智能计算与网络安全重点实验室,天津300384 [2]天津理工大学计算机科学与工程学院,天津300384
出 处:《天津理工大学学报》2021年第4期40-46,共7页Journal of Tianjin University of Technology
基 金:国家自然科学基金(71501141)。
摘 要:文本分类是自然语言处理的典型应用,目前文本分类最常用的是深度学习的分类方法。针对中文文本数据具有多种特性,例如隐喻表达、语义多义性、语法特异性等,在文本分类中进行研究。提出基于编码器-解码器的双向编码表示法-双向门控制循环单元(bidirectional encoder representations from transformers-bidirectional gate recurrent unit,BERT-BiGRU)模型结构,使用BERT模型代替传统的Word2vec模型表示词向量,根据上下文信息计算字的表示,在融合上下文信息的同时还能根据字的多义性进行调整,增强了字的语义表示。在BERT模型后面增加了BiGRU,将训练后的词向量作为Bi GRU的输入进行训练,该模型可以同时从两个方向对文本信息进行特征提取,使模型具有更好的文本表示信息能力,达到更精确的文本分类效果。使用提出的BERT-BiGRU模型进行文本分类,最终准确率达到0.93,召回率达到0.94,综合评价数值F1达到0.93。通过与其他模型的试验结果对比,发现BERT-BiGRU模型在中文文本分类任务中有良好的性能。Text classification is a typical application of natural language processing. At present, the most commonly used text classification method is deep learning. This paper studies the text classification of Chinese text data with various characteris-tics, such as metaphor expression, semantic polysemicity and grammatical specificity, and proposes the structure of bidirectional encoder representations from transformers-bidirectional gate recurrent unit(BERT-BiGRU) model. The BERT model is used instead of the traditional word2 vec model to represent the word vector, and the word representation is calculated according to the context information. It can be adjusted according to the polysemation of the word while the context information is fused, thus enhancing the semantic representation of the word. BiGRU is added to the BERT model to train the trained word vector as the input of Bi GRU. This model can extract the text information features from both directions at the same time, so that the model can have better text representation information and achieve more accurate text classification effect. Setting up multiple sets of model compared with the proposed model test, using the proposed BERT-BiGRU model for text classification, the final accuracy reached 0.93, the recall rate reached 0.94, and F1 value reached 0.93. Comparing with other model, test results show that BERT-Bi GRU model has good performance in the Chinese text classification task.
关 键 词:文本分类 深度学习 基于编码器-解码器的双向编码表示法(bidirectional encoder representations from transformers BERT)模型 双向门控制循环单元(bidirectional gate recurrent unit BiGRU)
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28