检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yaoyu Zhang Tao Luo Zheng Ma Zhi-Qin John Xu 张耀宇;罗涛;马征;许志钦(School of Mathematical Sciences,Institute of Natural Sciences,MOE-LSC,and Qing Yuan Research Institute,Shanghai Jiao Tong University,Shanghai 200240,China;Shanghai Center for Brain Science and Brain-Inspired Technology,Shanghai 200031,China)
机构地区:[1]School of Mathematical Sciences,Institute of Natural Sciences,MOE-LSC,and Qing Yuan Research Institute,Shanghai Jiao Tong University,Shanghai 200240,China [2]Shanghai Center for Brain Science and Brain-Inspired Technology,Shanghai 200031,China
出 处:《Chinese Physics Letters》2021年第3期121-126,共6页中国物理快报(英文版)
基 金:Supported by the National Key R&D Program of China(Grant No.2019YFA0709503);the Shanghai Sailing Program;the Natural Science Foundation of Shanghai(Grant No.20ZR1429000);the National Natural Science Foundation of China(Grant No.62002221);Shanghai Municipal of Science and Technology Project(Grant No.20JC1419500);the HPC of School of Mathematical Sciences at Shanghai Jiao Tong University。
摘 要:Why heavily parameterized neural networks(NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle(LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore,through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to an LFP model with quantitative prediction power.
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.70.76