基于ERNIE模型的中文文本分类研究  被引量:5

Research on Chinese text classification based on ERNIE model

在线阅读下载全文

作  者:毕云杉 钱亚冠[1] 张超华 潘俊 徐庆华[1] BI Yunshan;QIAN Yaguan;ZHANG Chaohua;PAN Jun;XU Qinghua(School of Sciences,Zhejiang University of Science and Technology,Hangzhou 310023,Zhejiang,China)

机构地区:[1]浙江科技学院理学院,杭州310023

出  处:《浙江科技学院学报》2021年第6期461-468,476,共9页Journal of Zhejiang University of Science and Technology

基  金:科技部重点研发项目(2018YFB2100400);国家自然科学基金项目(61902082)。

摘  要:针对基于深度学习的中文文本分类任务中词向量表示无法充分利用语义信息的问题,提出一种基于知识增强语义表示(enhanced representation through knowledge integration,ERNIE)模型的中文文本分类方法。首先,通过ERNIE模型获得语义表达更充分的分布式文本表示;然后引入深度卷积神经网络对上下文的编码特征进一步提取,以获得更深层次的文本特征表达;最后采用分类器(soft maximum,softmax)实现中文文本分类。在3个公开的中文数据集上进行了多组对比试验,发现本模型与传统基于双向编码器表征量(bidirectional encoder representation from transformers,BERT)的分类模型相比,准确率和F_(1)值分别平均提升了6.34%、4.82%,表明基于ERNIE模型的文本分类方法能有效提高中文文本分类的性能。本方法在多领域中文文本数据集上能够更准确地实现文本的分类,可为后续自然语言处理领域研究提供参考。In response to the problem that word vector representation can not fully utilize semantic information in the Chinese text classification task based on deep learning,a Chinese text classification method was proposedon the basis of ERNIE(enhanced representation throughknowledge integration)model.First,a more semantically expressive distributed text representation was obtained through the ERNIE model.Then,the deep convolutional neural network was introduced to further extract the encoding features of the context to obtain a deeper representation of the text features.Finally,a classifier(soft maximum,softmax)was used to realize Chinese text classification.A series of comparative experiments were conducted on three published Chinese data sets,and it was found that compared with the traditional classification model based on BERT(bidirectional encoder representation from transformers),this model has raised the accuracy and F_(1) value by 6.34% and 4.82% respectively,indicating that the text classification method based on ERNIE model can effectively improve the performance of Chinese text classification.The proposed method can achieve text classification more accurately on multi-domain Chinese text data sets,and can provide a reference for subsequent research in natural language processing.

关 键 词:自然语言处理 文本分类 深度学习 卷积神经网络 ERNIE 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象