检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郑承宇 王新[1] 王婷 尹甜甜 邓亚萍 ZHENG Cheng-yu;WANG Xin;WANG Ting;YIN Tian-tian;DENG Ya-ping(School of Mathematics and Computer Science, Yunnan Minzu University, Kunming 650500, China)
机构地区:[1]云南民族大学数学与计算机科学学院,昆明650500
出 处:《科学技术与工程》2022年第10期4033-4038,共6页Science Technology and Engineering
基 金:国家自然科学基金(61363022);云南省教育厅科学研究基金(2021Y670)。
摘 要:由于word2vec、Glove等静态词向量表示方法存在无法完整表示文本语义等问题,且当前主流神经网络模型在做文本分类问题时,其预测效果往往依赖于具体问题,场景适应性差,泛化能力弱。针对上述问题,提出一种多基模型框架(Stacking-Bert)的中文短文本分类方法。模型采用BERT预训练语言模型进行文本字向量表示,输出文本的深度特征信息向量,并利用TextCNN、DPCNN、TextRNN、TextRCNN等神经网络模型构建异质多基分类器,通过Stacking集成学习获取文本向量的不同特征信息表达,以提高模型的泛化能力,最后利用支持向量机(support vector machine,SVM)作为元分类器模型进行训练和预测。与word2vec-CNN、word2vec-BiLSTM、BERT-TexCNN、BERT-DPCNN、BERT-RNN、BERT-RCNN等文本分类算法在网络公开的三个中文数据集上进行对比实验,结果表明,Stacking-Bert集成学习模型的准确率、精确率、召回率和F_(1)均为最高,能有效提升中文短文本的分类性能。Duo to the static word vector representation methods such as word2vec and Glove have problems such as incomplete representation of text semantics,and when the current mainstream neural network model is doing text classification problems,its prediction effect often depends on specific problems,the scene adaptability is poor,and the generalization ability is weak.To solve the above problems,a chinese short text classification method based on multi-base model framework named Stacking-Bert was proposed.Firstly,the model used the BERT pre-trained language model to represent text word vectors,and the deep feature information vector of the text is output.Then,the neural network models such as TextCNN,DPCNN,TextRNN,TextRCNN is used to construct a heterogeneous multi-base classifier,and obtain the text vector through Stacking integration learning Different feature information was expressed to improve the generalization ability of the model.Finally,the support vector machine was used as a meta-classifier model for training and prediction.Comparing experiments with text classification algorithms such as word2vec-CNN,word2vec-BiLSTM,BERT-texCNN,BERT-DPCNN,BERT-RNN,BERT-RCNN,etc.on three Chinese data sets published on the Internet,the results show that Stacking-Bert integrated learning.The model has the highest accuracy rate,precision rate,recall rate and F_(1) value,which can effectively improve the classification performance of chinese short texts.
关 键 词:多基模型框架 BERT预训练语言模型 Stacking集成学习 短文本分类
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.190.158.12