检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邹博士 杨铭 宗辰辰 谢明昆 黄圣君[1] ZOU Boshi;YANG Ming;ZONG Chenchen;XIE Mingkun;HUANG Shengjun(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing Jiangsu 211106,China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106
出 处:《计算机应用》2024年第5期1479-1484,共6页journal of Computer Applications
摘 要:噪声标记学习方法能够有效利用含有噪声标记的数据训练模型,显著降低大规模数据集的标注成本。现有的噪声标记学习方法通常假设数据集中各个类别的样本数目是平衡的,但许多真实场景下的数据往往存在噪声标记,且数据的真实分布具有长尾现象,这导致现有方法难以设计有效的指标,如训练损失或置信度区分尾部类别中的干净样本和噪声样本。为了解决噪声长尾学习问题,提出一种基于负学习的样本重加权鲁棒学习(NLRW)方法。具体来说,根据模型对头部类别和尾部类别样本的输出分布,提出一种新的样本权重计算方法,能够使干净样本的权重接近1,噪声样本的权重接近0。为了保证模型对样本的输出准确,结合负学习和交叉熵损失使用样本加权的损失函数训练模型。实验结果表明,在多种不平衡率和噪声率的CIFAR-10以及CIFAR-100数据集上,NLRW方法相较于噪声长尾分类的最优基线模型TBSS(Two stage Bi-dimensional Sample Selection),平均准确率分别提升4.79%和3.46%。Noisy label learning methods can effectively use data containing noisy labels to train models and significantly reduce the labeling cost of large-scale datasets.Most existing noisy label learning methods usually assume that the number of each class in the dataset is balanced,but the data in many real-world scenarios tend to have noisy labels,and long-tailed distributions often present in the dataset simultaneously,making it difficult for existing methods to select clean examples from noisy examples in the tail class according to traning loss or confidence.To solve noisy long-tailed learning problem,a ReWeighting examples with Negative Learning(NLRW)method was proposed,by which examples were reweighted adaptively based on negative learning.Specifically,at each training epoch,the weights of examples were calculated according to the output distributions of the model to head classes and tail classes.The weights of clean examples were close to one while the weights of noisy examples were close to zero.To ensure accurate estimation of weights,negative learning and cross entropy loss were combined to train the model with a weighted loss function.Experimental results on CIFAR-10 and CIFAR-100 datasets with various imbalance rates and noise rates show that,compared with the optimal baseline model TBSS(Two stage Bi-dimensional Sample Selection)for noisy long-tail classification,NLRW method improves the average accuracy by 4.79%and 3.46%,respectively.
关 键 词:噪声标记学习 长尾学习 噪声长尾学习 样本重加权 负学习
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30