检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:盛晓辉 沈海龙[1] Sheng Xiaohui;Shen Hailong(School of Science,Northeastern University,Shenyang 110819,China)
机构地区:[1]东北大学理学院,沈阳110819
出 处:《计算机应用研究》2023年第4期1019-1023,1051,共6页Application Research of Computers
摘 要:为了减少对有标记数据的依赖,充分利用大量无标记数据,提出了一个基于数据增强和相似伪标签的半监督文本分类算法(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels, STAP)。该算法利用EPiDA(easy plug-in data augmentation)框架和自训练对少量有标记数据进行扩充,采用一致性训练和相似伪标签考虑无标记数据及其增强样本之间的关系和高置信度的相似无标记数据之间的关系,在有监督交叉熵损失、无监督一致性损失和无监督配对损失的约束下,提高无标记数据的质量。在四个文本分类数据集上进行实验,与其他经典的文本分类算法相比,STAP算法有明显的改进效果。In order to reduce the dependence on labeled data and make full use of a large number of unlabeled data,this paper proposed the STAP(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels).The algorithm used EPiDA(easy plug-in data augmentation)framework and self-training to expand a small amount of labeled data.It used consistency training and similar pseudo-labels to consider the relationship between unlabeled data and its expanded samples and the relationship between similar unlabeled data with high confidence.Under the constraint of supervised cross entropy loss,unsupervised consistency loss and unsupervised pair loss,it improved the quality of unlabeled data.Experiments on four text classification datasets show that STAP algorithm has obvious improvement over other classical text classification algorithms.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.80