Stochastic normalized gradient descent with momentum for large-batch training  

在线阅读下载全文

作  者:Shen-Yi ZHAO Chang-Wei SHI Yin-Peng XIE Wu-Jun LI 

机构地区:[1]National Key Laboratory for Novel Software Technology,Department of Computer Science and Technology,Nanjing University,Nanjing 210023,China

出  处:《Science China(Information Sciences)》2024年第11期73-87,共15页中国科学(信息科学)(英文版)

基  金:supported by National Key R&D Program of China(Grant No.2020YFA0713901);National Natural Science Foundation of China(Grant Nos.61921006,62192783);Fundamental Research Funds for the Central Universities(Grant No.020214380108)。

摘  要:Stochastic gradient descent(SGD)and its variants have been the dominating optimization methods in machine learning.Compared with SGD with small-batch training,SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units(GPUs)and can reduce the number of communication rounds in distributed training settings.Thus,SGD with large-batch training has attracted considerable attention.However,existing empirical results showed that large-batch training typically leads to a drop in generalization accuracy.Hence,how to guarantee the generalization ability in large-batch training becomes a challenging task.In this paper,we propose a simple yet effective method,called stochastic normalized gradient descent with momentum(SNGM),for large-batch training.We prove that with the same number of gradient computations,SNGM can adopt a larger batch size than momentum SGD(MSGD),which is one of the most widely used variants of SGD,to converge to an?-stationary point.Empirical results on deep learning verify that when adopting the same large batch size,SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.

关 键 词:non-convex problems large-batch training stochastic normalized gradient descent MOMENTUM 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象