检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李翔 陈硕 杨健 LI Xiang;CHEN Shuo;YANG Jian(PCALab,Department of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
机构地区:[1]南京理工大学计算机科学与工程学院PCALab,南京210094
出 处:《计算机学报》2021年第10期2122-2134,共13页Chinese Journal of Computers
基 金:国家自然科学基金(No.U1713208);111项目(AH92005)资助.
摘 要:经验风险最小化(Empirical Risk Minimization,ERM)旨在学习一组模型参数来尽可能地拟合已观测到的样本,使得模型具有基础的识别能力.除了ERM,权重衰减(Weight Decay,WD)对于进一步提升模型的泛化能力,即对未观测样本的精准识别也非常重要.然而,WD的具体形式仅仅是在优化过程中不断缩小所学习的模型参数,这很难与提升泛化能力这个概念直接地联系起来,尤其是对于多层深度网络而言.本文首先从计算学习理论(learning theory)中的鲁棒性(robustness)与泛化性(generalization)之间的量化关系出发,推导出了一个统一的泛化界正则项(Generalization Bound Regularizer,GBR)来理解WD的作用.本文证明了优化WD项(作为损失目标函数的一部分)本质上是在优化GBR的上界,而GBR则与模型的泛化能力有着理论上的直接关联.对于单层线性系统,本文可以直接推导出该上界;对于多层深度神经网络,该上界可以通过几个不等式的松弛来获得.本文通过引入均等范数约束(Equivalent Norm Constraint,ENC)即保证上述不等式的取等条件来进一步压缩GBR与其上界之间的距离,从而获得具有更好泛化能力的网络模型,该模型的识别性能在大型ImageNet数据集上得到了全面的验证.Empirical Risk Minimization(ERM)aims to learn parameters of a model that can perfectly,at least,master a set of observed examples.Beyond ERM,the Weight Decay(WD)regularization term is also necessary to ensure the trained models with generalization ability on unseen objects.However,the form of WD targets at making the learning parameters small during optimization,which naturally lacks smooth connection to the concept of generalization,especially for multi-layer deep networks.This paper first aims to bridge this gap through a proposed unified framework,namely Generalization Bound Regularizer(GBR),which is theoretically deduced from the robustness and generalization theory.Specifically,we demonstrate that optimizing WD term,as a part of the loss objective,is actually optimizing an upper bound of the underlying GBR,which is directly related to the generalization ability of models.For a single-layer linear system,this upper bound can be derived directly;for a multi-layer deep network,this upper bound is obtained via additional relaxations of several inequalities.By introducing Equivalent Norm Constraint(ENC)and further equalizing the GBR and its corresponding upper bound,it is easy to get a more generalized model with improved recognition performance,which is comprehensively validated on the large-scale ImageNet dataset.
关 键 词:泛化界正则项 经验风险最小化 权重衰减 均等范数约束 深度神经网络
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.158.137