机构地区:[1]中国科学院计算技术研究所智能信息处理重点实验室,北京100190 [2]中国科学院大学计算机科学与技术学院,北京101408 [3]中国科学院信息工程研究所信息安全国家重点实验室,北京100093 [4]中国科学院大学网络空间安全学院,北京100049 [5]中国科学院大学大数据挖掘与知识管理重点实验室,北京101408 [6]鹏城实验室,广东深圳518055
出 处:《计算机学报》2024年第11期2678-2690,共13页Chinese Journal of Computers
基 金:科技创新2030-“新一代人工智能”重大项目(2018AAA0102000);国家自然科学基金项目(62236008,U21B2038,U23B2051,61931008,62122075,61976202);中央高校基本科研业务费专项基金;中国科学院青年促进会会员项目;中国科学院战略性先导科技专项(XDB0680000);中国科学院计算技术研究所创新基金(E000000)资助。
摘 要:多任务学习是一种基于相似任务之间的关联性进行学习迁移,使得模型在数据不足场景下仍能表现出良好泛化性能的学习方法.在该领域内,大多数现有以准确率作为基准评价标准的方法只适用于平衡分布场景.然而,诸多实际应用如疾病检测、垃圾邮件检测等,均涉及样本分布不平衡问题.针对多任务学习面向任务相关性的高要求,即当模型学习和共享不相关知识时,负迁移可能会影响模型朝着错误方向训练.因此,大多数现有方法在此类场景中无法得到有效应用.为解决该实际问题,设计一种能适用于样本不平衡场景的多任务学习算法变得尤为重要.本文提出了一种基于自适应低秩表示的多任务AUC优化算法,首先引入了对标签分布不敏感的ROC曲线下面积(AUC)作为该学习任务的评价指标,并建立了一种用于AUC优化的多任务学习算法,以提高模型在样本不平衡场景下的性能表现.同时,为进一步有效优化模型,本文将原始成对优化问题重构为逐样本极大极小优化问题,使得每一轮迭代复杂度由O(Ln_(i,+)n_(i,-))降低至O(L(n_(i,+)+n_(i,-))).针对多任务学习中存在的负迁移现象,本文引入了一种自适应低秩正则项,以消除模型冗余信息,同时提高模型的泛化性能.最后,通过与多个对比方法在四个仿真数据集和三个真实数据集Landmine、MHC-I和USPS上的比较,所有实验结果一致证明了本文所提出算法的有效性.In recent years,benefiting from the excellent performance and work efficiency of deep neural networks(DNNs),machine learning technology has achieved great success in various fields,such as natural language processing,computer vision,medical named entity recognition,and medical image analysis.In this field,multi-task learning(MTL) is based on the correlation between similar tasks for learning transfer,enabling the model to still exhibit good generalization performance in scenarios with insufficient data.In the past decade,most existing methods are proposed for the balanced category distribution,and use accuracy-based metrics as the benchmark evaluations.However,many practical applications,such as disease detection and spam,suffer from imbalanced sample distributions,which causes the performance degradation of DNNs.Furthermore,multi-task learning has high requirements for task relevance and is apt to the negative transfer phenomenon.Specifically,when models learn to share knowledge among tasks,the irrelevant knowledge may mislead the model training in the wrong direction.This process would result in an unexpected dilemma while most existing methods cannot be effectively applied in such scenarios.Hence,to address this learning problem,designing a multi-task learning algorithm that can learn in imbalanced sample scenarios with low-correlation tasks is of paramount importance to practical applications,as well as represents a critical machine learning challenge.This paper proposes a multi-task AUC optimization method based on an adaptive low-rank Factor Nuclear Norm minus Frobenuis Norm(FNNFN) regularizer to achieve robustness on imbalanced and irrelevant data,doomed MTAUC-FNNFN.Firstly,the area under the ROC curve(AUC),which is usually adopted as the measure for imbalanced distribution,is introduced for directly reflecting the model performance among tasks.Considering the discontinuity and non-differentiability of the loss function for AUC,this work establishes a novel multi-task learning algorithm for AUC optimizatio
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...