基于数据增强和相似伪标签的半监督文本分类算法被引量：5

Semi-supervised text classification algorithm with data augmentation and similar pseudo-labels

作　　者：盛晓辉沈海龙[1] Sheng Xiaohui;Shen Hailong(School of Science,Northeastern University,Shenyang 110819,China)

出　　处：《计算机应用研究》2023年第4期1019-1023,1051,共6页Application Research of Computers

摘　　要：为了减少对有标记数据的依赖,充分利用大量无标记数据,提出了一个基于数据增强和相似伪标签的半监督文本分类算法(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels, STAP)。该算法利用EPiDA(easy plug-in data augmentation)框架和自训练对少量有标记数据进行扩充,采用一致性训练和相似伪标签考虑无标记数据及其增强样本之间的关系和高置信度的相似无标记数据之间的关系,在有监督交叉熵损失、无监督一致性损失和无监督配对损失的约束下,提高无标记数据的质量。在四个文本分类数据集上进行实验,与其他经典的文本分类算法相比,STAP算法有明显的改进效果。In order to reduce the dependence on labeled data and make full use of a large number of unlabeled data,this paper proposed the STAP(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels).The algorithm used EPiDA(easy plug-in data augmentation)framework and self-training to expand a small amount of labeled data.It used consistency training and similar pseudo-labels to consider the relationship between unlabeled data and its expanded samples and the relationship between similar unlabeled data with high confidence.Under the constraint of supervised cross entropy loss,unsupervised consistency loss and unsupervised pair loss,it improved the quality of unlabeled data.Experiments on four text classification datasets show that STAP algorithm has obvious improvement over other classical text classification algorithms.

关键词：半监督学习文本分类数据增强相似伪标签

分类号：TP[自动化与计算机技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据增强和相似伪标签的半监督文本分类算法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据增强和相似伪标签的半监督文本分类算法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于数据增强和相似伪标签的半监督文本分类算法被引量：5