基于环境反馈机制的四足机器人运动技能学习被引量：2

Motor skill learning of quadruped robot based on environmental feedback mechanism

作　　者：张思远朱晓庆阮晓钢李春阳刘鑫源 ZHANG Si-yuan;ZHU Xiao-qing;RUAN Xiao-gang;LI Chun-yang;LIU Xin-yuan(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124,China)

机构地区：[1]北京工业大学信息学部,北京100124 [2]北京计算智能与智能系统重点实验室,北京100124

出　　处：《控制与决策》2024年第5期1461-1468,共8页Control and Decision

基　　金：国家自然科学基金项目(62103009);北京市自然科学基金项目(4202005)。

摘　　要：哺乳动物的运动学习机制已得到广泛研究,犬科动物可以根据环境反馈的引导性信息自主地学习运动技能,对其提供更为特定的训练引导可以加快其对相关任务的学习速度.受上述启发,在软演员-评论家算法(SAC)的基础上提出一种基于期望状态奖励引导的强化学习算法(DSG-SAC),利用环境中的状态反馈机制来引导四足机器人进行有效探索,可以提高四足机器人仿生步态学习效果,并提高训练效率.在该算法中,策略网络与评价网络先近似拟合期望状态观测与当前状态的误差,再经过当前状态的正反馈后输出评价函数与动作,使四足机器人朝着期望的方向动作.将所提出算法在四足机器人上进行验证,通过实验结果可知,所提出的算法能够完成四足机器人的仿生步态学习.进一步,设计消融实验来探讨超参数温度系数和折扣因子对算法的影响,实验结果表明,改进后的算法具有比单纯的SAC算法更加优越的性能.The motor learning mechanism of mammals has been extensively studied,and the learning speed of canines for relevant tasks can be accelerated by conducting guided training on them.According to the above inspiration,this paper proposes a reinforcement learning algorithm based on desired state reward guidance(DSG-SAC)on the basis of soft actor-critic algorithm(SAC).This algorithm uses the state feedback mechanism in the environment to guide the quadruped robot to explore effectively,which can improve the bionic gait learning effect of the quadruped robot and improve the training efficiency.In this algorithm,the strategy network and the evaluation network first approximate the error between the desired state observation and the current state,and after the positive feedback from the current state,the evaluation function and the action are output,so that the quadruped robot moves in the desired direction.In this thesis,the algorithm is verified on a quadruped robot,and the experimental results can be concluded that the proposed algorithm can complete the bionic gait learning of the quadruped robot.Ablation experiments are designed to investigate the effects of hyperparametric temperature coefficients and discount factors on the algorithm,and finally experiments are designed to verify that the improved algorithm has superior performance than the simple SAC algorithm.

关键词：强化学习四足机器人仿生步态学习环境探索状态反馈引导

分类号：TP273[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于环境反馈机制的四足机器人运动技能学习被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于环境反馈机制的四足机器人运动技能学习 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于环境反馈机制的四足机器人运动技能学习被引量：2