基于优化并行的四足机器人运动技能学习

Optimization-based parallel learning of quadruped robot locomotion skills

作　　者：张思远朱晓庆陈江涛[1,2] 刘鑫源王涛 ZHANG Siyuan;ZHU Xiaoqing;CHEN Jiangtao;LIU Xinyuan;WANG Tao(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing Institute of Artificial Intelligence,Beijing University of Technology,Beijing 100124,China)

机构地区：[1]北京工业大学信息学部,北京100124 [2]北京工业大学北京人工智能研究院,北京计算智能与智能系统重点实验室,北京100124

出　　处：《清华大学学报（自然科学版）》2024年第10期1696-1705,共10页Journal of Tsinghua University(Science and Technology)

基　　金：国家自然科学基金青年科学基金项目(62103009);北京市自然科学基金面上项目(4202005)。

摘　　要：动物对自然界的适应能力是由环境选择与适者生存决定的,四足哺乳动物可以通过种群的进化逐步适应环境的变化,提高其对环境的适应度和种群的生存率。基于上述启发,该文在软演员-评论家(SAC)算法基础上提出一种基于优化并行强化学习的算法OP-SAC,该算法使用进化策略与强化学习交替训练,通过知识共享和知识继承优化四足机器人学习效果,提高训练效率。算法验证结果显示,OP-SAC算法能够完成四足机器人的仿生步态学习;对比实验验证出OP-SAC算法比其他结合了进化策略的SAC算法具有更加优越的鲁棒性;设计消融实验证明了知识共享和知识继承策略使算法的训练效果获得较大提升。[Objective]Inspired by the skill learning of quadruped animals in nature,deep reinforcement learning has been widely applied to learn the quadruped robot locomotion skill.Through interaction with the environment,robots can autonomously learn complete motion strategies.However,traditional reinforcement learning has several drawbacks,such as large computational requirements,slow convergence of algorithms,and rigid learning strategies,which substantially reduce training efficiency and generate unnecessary time costs.To address these shortcomings,this paper introduces evolutionary strategies into the soft actor-critic(SAC)algorithm,proposing an optimized parallel SAC(OP-SAC)algorithm for the parallel training of quadruped robots using evolutionary strategies and reinforcement learning.[Methods]The algorithm first uses a variant temperature coefficient SAC algorithm to reduce the impact of hyperparameter temperature coefficients on the training process and then introduces evolutionary strategies using the reference trajectory trained by the evolutionary strategy as a sample input to guide the training direction of the SAC algorithm.Additionally,the state information and reward values obtained from SAC algorithm training are used as inputs and offspring selection thresholds for the evolutionary strategy,achieving the decoupling of training data.Furthermore,the algorithm adopts an alternating training approach,introducing a knowledge-sharing strategy where the training results of the evolutionary strategy and reinforcement learning are stored in a common experience pool.Furthermore,a knowledge inheritance mechanism is introduced,allowing the training results of both strategies to be passed on to the next stage of the algorithm.With these two training strategies,the evolutionary strategy and reinforcement learning can guide each other in terms of the training direction and pass useful information between different generations,thereby accelerating the learning process and enhancing the robustness of the algorithm.[Results

关键词：生物进化强化学习四足机器人进化策略

分类号：TP393.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优化并行的四足机器人运动技能学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优化并行的四足机器人运动技能学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索