基于策略蒸馏的四足机器人步态学习方法

Gait learning method of quadruped robot based on policy distillation

作　　者：朱晓庆王涛[1,2] 阮晓钢陈江涛[1,2] 南博睿毕兰越[1,2] ZHU Xiaoqing;WANG Tao;RUAN Xiaogang;CHEN Jiangtao;NAN Borui;BI Lanyue(Faulty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing University of Technology,Beijing 100124,China)

机构地区：[1]北京工业大学信息学部,北京100124 [2]北京工业大学计算智能与智能系统北京市重点实验室,北京100124

出　　处：《北京航空航天大学学报》2025年第2期428-439,共12页Journal of Beijing University of Aeronautics and Astronautics

基　　金：国家自然科学基金(62103009);北京市自然科学基金(4202005)。

摘　　要：以柔性动作评价(SAC)为代表的强化学习算法在机器人复现高等动物的运动技能中已取得成功,该框架将策略搜索和状态动作价值函数相结合。但智能体使用策略探索是贪婪的,评价网络估算的Q值函数却使用低估值。为使智能体采取更好的策略,将策略蒸馏(PD)与SAC算法相融合,提出一种PD柔性动作评价(PDSAC)算法,该算法让智能体使用混合策略进行探索,使强化学习得到的奖励函数收敛速度加快。为验证PDSAC算法的有效性,理论证明该算法能提升策略的探索效率,并在四足机器人步态学习任务中进行了验证。仿真实验结果表明:相比SAC算法,PDSAC算法在步态学习任务中可以使奖励函数值提高26.7%,同时收敛速度提升40%。Reinforcement learning algorithm represented by flexible action evaluation(SAC)has been successful in reproducing the motor skills of higher animals.This framework combines strategy search and state action value function.However,the agent use strategy exploration is greedy,and the Q value function of evaluation network estimation uses low valuation.This paper proposes a policy distillation(PD)soft actor-critic(PDSAC)algorithm that integrates PD and SAC algorithms to enable agents to adopt better policies.This algorithm allows the agent to explore using hybrid policies and speeds up the convergence of the reward function from reinforcement learning.To validate the proposed algorithm,Theoretical proof that the PDSAC algorithm improves the efficiency of policy exploration and validation in quadruped robot gait learning tasks.According to simulation results,the PDSAC outperforms the SAC in the gait learning task,achieving a 40%increase in convergence speed and a 26.7%improvement in the reward value function.

关键词：强化学习策略蒸馏混合策略好奇心探索策略步态学习

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于策略蒸馏的四足机器人步态学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于策略蒸馏的四足机器人步态学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索