检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:翟德明[1] 沈斯娴 周雄 江俊君 刘贤明[1] 季向阳[2] ZHAI De-Ming;SHEN Si-Xian;ZHOU Xiong;JIANG Jun-Jun;LIU Xian-Ming;JI Xiang-Yang(Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China;Department of Automation,Tsinghua University,Beijing 100084,China)
机构地区:[1]哈尔滨工业大学计算学部,黑龙江哈尔滨150001 [2]清华大学自动化系,北京100084
出 处:《软件学报》2024年第11期5196-5209,共14页Journal of Software
基 金:国家自然科学基金(6207115,61922027)。
摘 要:目前,深度学习广泛应用于各个领域并取得了优异的表现,这通常需要大量标注数据的支持,而大量标注数据的获取往往意味着高昂的成本与苛刻的应用条件.因此,随着深度学习的发展,如何在实际场景下突破数据限制,成为目前重要的研究目标,而半监督学习正是其中一大研究方向.半监督学习通过利用大量的未标记数据辅助少量的标记数据进行学习,很好地减轻了深度学习的数据需求压力.伪标签生成方法是当前半监督学习的重要组成部分,所生成的伪标签质量的优劣会很大程度影响半监督学习的最终效果.聚焦半监督学习中的伪标签生成问题,提出基于最优传输理论的伪标签生成方法.所提方法在将有标签信息作为生成过程引导的同时引入类别均衡约束,在此基础上将半监督学习的伪标签生成过程转换成最优传输优化问题,给出新的求解伪标签生成问题的形式.为求解该优化问题,引入Sinkhorn-Knopp算法进行近似快速求解,避免不可计算问题.所提伪标签生成方法作为半监督学习中的独立过程可结合当前一致性正则等半监督学习技巧构成完整的半监督学习过程.最终,在CIFAR-10、SVHN、MNIST、FashionMNIST这4大公共经典图像分类数据集上进行实验,验证方法的有效性.实验结果显示,所提方法与当前先进的半监督学习方法相比,均取得更优异的结果,尤其是在标签情况较少的情况下提升显著.Deep learning has been widely employed in many fields and yields excellent performance.However,this often requires the support of large amounts of labeled data,which usually means high costs and harsh application conditions.Therefore,with the development of deep learning,how to break through data limitations in practical scenarios has become an important research problem.Specifically,as one of the most important research directions,semi-supervised learning greatly relieves the data requirement pressure of deep learning by conducting learning with the assistance of abundant unlabeled data and a small number of labeled data.The pseudo-labeling method plays a significant role in semi-supervised learning,and the quality of its generated pseudo labels will influence the final results of semi-supervised learning.Focusing on pseudo-labeling in semi-supervised learning,this study proposes the pseudo-labeling method based on optimal transport theory,which introduces the pseudo-labeling procedure constraint with labeled data as generation process guidance.On this basis,the pseudo-labeling procedure is converted to the optimization problem of optimal transport,which offers a new form for solving pseudo-labeling.Meanwhile,to solve this problem,this study introduces the Sinkhorn-Knopp algorithm for approximate fast solutions to avoid the heavy computation burden.As an independent module,the proposed method can be combined with other semi-supervised learning tricks such as consistency regularization for complete semi-supervised learning.Finally,this study conducts experiments on four classic public image classification datasets of CIFAR-10,SVHN,MNIST,and FashionMNIST to verify the effectiveness of the proposed method.The experimental results show that compared with the state-of-the-art semi-supervised learning methods,this method yields better performance,especially under fewer labeled data.
关 键 词:半监督学习 伪标签生成 最优传输 图像分类 深度学习
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.80