基于最大熵深度强化学习的双足机器人步态控制方法被引量：1

Gait control method based on maximum entropy deep reinforcement learning for biped robot

作　　者：李源潮陶重犇[1,2] 王琛 LI Yuanchao;TAO Chongben;WANG Chen(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou Jiangsu 215009,China;Suzhou Automotive Research Institute,Tsinghua University,Suzhou Jiangsu 215134,China)

机构地区：[1]苏州科技大学电子与信息工程学院,江苏苏州215009 [2]清华大学苏州汽车研究院,江苏苏州215134

出　　处：《计算机应用》2024年第2期445-451,共7页journal of Computer Applications

基　　金：国家自然科学基金资助项目(62201375);中国博士后科学基金资助项目(2021M691848);江苏省自然科学基金资助项目(BK20220635);苏州市科技项目(SS2019029)。

摘　　要：针对双足机器人连续直线行走的步态稳定控制问题,提出一种基于最大熵深度强化学习(DRL)的柔性演员-评论家(SAC)步态控制方法。首先,该方法无需事先建立准确的机器人动力学模型,所有参数均来自关节角而无需额外的传感器;其次,采用余弦相似度方法对经验样本分类,优化经验回放机制;最后,根据知识和经验设计奖励函数,使双足机器人在直线行走训练过程中不断进行姿态调整,确保直线行走的鲁棒性。在Roboschool仿真环境中与其他先进深度强化学习算法,如近端策略优化(PPO)方法和信赖域策略优化(TRPO)方法的实验对比结果表明,所提方法不仅实现了双足机器人快速稳定的直线行走,而且鲁棒性更好。For the problem of gait stability control for continuous linear walking of a biped robot,a Soft Actor-Critic(SAC)gait control algorithm based on maximum entropy Deep Reinforcement Learning(DRL)was proposed.Firstly,without accurate robot dynamic model built in advance,all parameters were derived from joint angles without additional sensors.Secondly,the cosine similarity method was used to classify experience samples and optimize the experience replay mechanism.Finally,reward functions were designed based on knowledge and experience to enable the biped robot continuously adjust its attitude during the linear walking training process,and the reward functions ensured the robustness of straight walking.The proposed method was compared with other DRL methods such as PPO(Proximal Policy Optimization)and TRPO(Trust Region Policy Optimization)in Roboschool simulation environment.The results show that the proposed method not only achieves fast and stable linear walking of the biped robot,but also has better algorithmic robustness.

关键词：双足机器人步态控制深度强化学习最大熵柔性演员-评论家算法

分类号：TP242.6[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于最大熵深度强化学习的双足机器人步态控制方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于最大熵深度强化学习的双足机器人步态控制方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于最大熵深度强化学习的双足机器人步态控制方法被引量：1