检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:汤英杰 刘媛华[1] TANG Yingjie;LIU Yuanhua(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)
出 处:《上海理工大学学报》2023年第2期189-197,204,共10页Journal of University of Shanghai For Science and Technology
基 金:国家自然科学基金资助项目(71771152)。
摘 要:为解决传统模型表示出的词向量存在序列、上下文、语法、语义以及深层次的信息表示不明的情况,提出一种基于预训练模型(Roberta)融合深层特征词向量的深度神经网络模型,处理中文文本分类的问题。通过Roberta模型生成含有上下文语义、语法信息的句子向量和含有句子结构特征的词向量,使用DPCNN模型和改进门控模型(RGRU)对词向量进行特征提取和融合,得到含有深层结构和局部信息的特征词向量,将句子向量与特征词向量融合在一起得到新向量。最后,新向量经过softmax激活层后,输出结果。在实验结果中,以F1值、准确率、召回率为评价标准,在THUCNews长文本中,这些指标分别达到了98.41%,98.44%,98.41%。同时,该模型在短文本分类中也取得了很好的成绩。A deep neural network model based on pre-training model(Roberta)and deep feature word vector was proposed to deal with the problem of Chinese text classification,in order to solve the problem that the word vector represented by the traditional model has unclear sequence,context,grammar,semantics and deep information representation.The sentence vector containing context semantics and grammar information and the word vector containing sentence structure features were generated by Roberta model.The word vector was extracted and fused by DPCNN model and revised gate recurrent unit(RGRU)to obtain the feature word vector containing deep structure and local information.The sentence vector and feature word vector were fused together to obtain a new vector.Finally,after the new vector passed through the softmax activation layer,the result was output.In the experimental results,F1 value,accuracy and recall were chosen as the evaluation criteria,they reached 98.41%,98.44%and 98.41%in the long text of THUCNews.At the same time,the model had also achieved good results in short text classification.
关 键 词:预训练模型 Roberta模型 DPCNN模型 特征词向量 中文文本分类
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222