深度学习步长自适应动量优化方法研究综述  

Review of Adaptive Stepsize Momentum Optimization Methods in Deep Learning

作  者:陶蔚 陇盛 刘鑫 胡亚豪 黄金才[2] TAO Wei;LONG Sheng;LIU Xin;HU Yahao;HUANG Jincai(Strategic Assessments and Consultation Institute,Academy of Military Science,Beijing 100091,China;China Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China;Command and Control Engineering School,Army Engineering University,Nanjing 210007,China)

机构地区:[1]军事科学院战略评估咨询中心,北京100091 [2]国防科技大学大数据与决策重点实验室,长沙410073 [3]陆军工程大学指挥控制工程学院,南京210007

出  处:《小型微型计算机系统》2025年第2期257-265,共9页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(62106281)资助。

摘  要:当前,以深度神经网络和预训练模型为基础的生成式人工智能受到学术界和工业界的普遍关注.深度学习的研究达到前所未有的高度.自2015年提出以来,无论是图像、语音还是文本等领域,以Adam为代表的自适应动量优化方法,因其快速的收敛速度、适应各种梯度和参数变化的能力,已经成为深度学习训练的首选方法,但是仍然存在:1)算法的全局收敛性较差;2)参数选择策略与理论分析不一致;3)针对不同任务的泛化性能有待进一步提升.为分析并解决以上挑战,研究者们分别使用自适应步长和动量两种优化技巧对自适应动量方法进行了大量研究.本文是这一类方法的研究综述,首先回顾了深度学习优化的发展背景与面临的挑战,重点介绍了一阶梯度条件下的自适应步长方法、动量算法、步长自适应动量算法、大模型中的应用等,尤其是针对凸情形下收敛性研究进展进行了系统梳理,最后展望了步长自适应动量算法未来发展方向.Based on deep neural networks and pretrained models,generative artificial intelligence has received widespread attention in academia and industry.Since 2015,adaptive stepsize momentum optimization methods represented by Adam have become the preferred algorithm for training deep neural networks in nature language processing,computer vision,multimodal communities.Because Adam has the ability to converge quickly,adapt to various gradients and parameter settings.However,there are still several issues:firstly,the behavior of global convergence of the algorithm is poor;secondly,the parameter selection strategy is inconsistent with theoretical analysis;lastly,the performance of the optimization model needs to be further improved.To analyze and solve these challenges,researchers have conducted extensive studies on adaptive momentum methods with two techniques:adaptive stepsize and momentum.This paper is a review of research on such adaptive momentum methods,starting with a review of the basic concepts of machine learning optimization,with a focus on adaptive stepsize methods,momentum algorithms,adaptive stepsize momentum algorithms,and their applications in large language models under the condition of first-order gradient,especially systematically sorting out the convergence research progress under convex conditions,and finally looking forward to the future development direction of adaptive momentum algorithms.

关 键 词:深度学习 优化算法 动量 自适应步长 收敛性 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象