泛化界正则项:理解权重衰减正则形式的统一视角  被引量:2

Generalization Bound Regularizer:A Unified Perspective for Understanding Weight Decay

在线阅读下载全文

作  者:李翔 陈硕 杨健 LI Xiang;CHEN Shuo;YANG Jian(PCALab,Department of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)

机构地区:[1]南京理工大学计算机科学与工程学院PCALab,南京210094

出  处:《计算机学报》2021年第10期2122-2134,共13页Chinese Journal of Computers

基  金:国家自然科学基金(No.U1713208);111项目(AH92005)资助.

摘  要:经验风险最小化(Empirical Risk Minimization,ERM)旨在学习一组模型参数来尽可能地拟合已观测到的样本,使得模型具有基础的识别能力.除了ERM,权重衰减(Weight Decay,WD)对于进一步提升模型的泛化能力,即对未观测样本的精准识别也非常重要.然而,WD的具体形式仅仅是在优化过程中不断缩小所学习的模型参数,这很难与提升泛化能力这个概念直接地联系起来,尤其是对于多层深度网络而言.本文首先从计算学习理论(learning theory)中的鲁棒性(robustness)与泛化性(generalization)之间的量化关系出发,推导出了一个统一的泛化界正则项(Generalization Bound Regularizer,GBR)来理解WD的作用.本文证明了优化WD项(作为损失目标函数的一部分)本质上是在优化GBR的上界,而GBR则与模型的泛化能力有着理论上的直接关联.对于单层线性系统,本文可以直接推导出该上界;对于多层深度神经网络,该上界可以通过几个不等式的松弛来获得.本文通过引入均等范数约束(Equivalent Norm Constraint,ENC)即保证上述不等式的取等条件来进一步压缩GBR与其上界之间的距离,从而获得具有更好泛化能力的网络模型,该模型的识别性能在大型ImageNet数据集上得到了全面的验证.Empirical Risk Minimization(ERM)aims to learn parameters of a model that can perfectly,at least,master a set of observed examples.Beyond ERM,the Weight Decay(WD)regularization term is also necessary to ensure the trained models with generalization ability on unseen objects.However,the form of WD targets at making the learning parameters small during optimization,which naturally lacks smooth connection to the concept of generalization,especially for multi-layer deep networks.This paper first aims to bridge this gap through a proposed unified framework,namely Generalization Bound Regularizer(GBR),which is theoretically deduced from the robustness and generalization theory.Specifically,we demonstrate that optimizing WD term,as a part of the loss objective,is actually optimizing an upper bound of the underlying GBR,which is directly related to the generalization ability of models.For a single-layer linear system,this upper bound can be derived directly;for a multi-layer deep network,this upper bound is obtained via additional relaxations of several inequalities.By introducing Equivalent Norm Constraint(ENC)and further equalizing the GBR and its corresponding upper bound,it is easy to get a more generalized model with improved recognition performance,which is comprehensively validated on the large-scale ImageNet dataset.

关 键 词:泛化界正则项 经验风险最小化 权重衰减 均等范数约束 深度神经网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象