基于多任务强化学习的地形自适应模仿学习方法

Terrain-Adaptive Motion Imitation Based on Multi-task Reinforcement Learning

作　　者：余昊梁宇宸张驰[2] 刘跃虎[2] YU Hao;LIANG Yuchen;ZHANG Chi;LIU Yuehu(School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;College of Artificial Intelligence,Xi’an Jiaotong University,Xi’an 710049,China)

机构地区：[1]西安交通大学软件学院,西安710049 [2]西安交通大学人工智能学院,西安710049

出　　处：《数据采集与处理》2024年第5期1182-1191,共10页Journal of Data Acquisition and Processing

基　　金：科技创新2030“新一代人工智能”重大项目(2018AAA0102504)。

摘　　要：地形自适应能力是智能体在复杂地形条件下稳定运动的基础,而由于机器人动力学系统的复杂性,传统逆动力学方法通常难以使其具备这种能力。现有利用强化学习在解决序列决策问题上的优势训练智能体地形适应能力的单任务学习方法无法有效学习各类地形中的相关性。事实上,复杂地形自适应任务可以认为是一种多任务,子任务间的关系可以用不同地形影响因素来衡量,通过子任务模型的相互学习解决数据分布信息获取不全面的问题。基于此,本文提出一种多任务强化学习方法。该方法包含1个由子任务预训练模型组成的执行层和1个基于强化学习方法、采用软约束融合执行层模型的决策层。在LeggedGym地形仿真器上的实验证明,本文方法训练的智能体运动更加稳定,在复杂地形上的摔倒次数更少,并且表现出更好的泛化性能。Terrain adaptive ability is the basis for the stable movement of agents under complex terrain conditions.Due to the complexity of the dynamical systems of these agents,such as humanoid robots,it is usually difficult for traditional inverse dynamics methods to have such ability.Recent research has used the advantages of reinforcement learning in solving sequential decision-making problems to train agents to adapt to terrain.However,these single-task learning methods cannot effectively learn the correlation in various terrains.In fact,complex terrain adaptive tasks can be considered as a multi-task problem,and the relationship between sub-tasks can be measured by different terrain factors.And then,the problem of incomplete acquisition of data distribution information can be solved by mutual learning of sub-task models.Therefore,this paper proposes a multi-task reinforcement learning method.It contains an execution layer which is consist of pre-trained subtask models and a decision layer based on reinforcement learning method.Moreover,the decision layer uses soft constraints to fuse models of the execution layer.Experiments on LeggedGym terrain simulator prove that the agent trained by the method in this paper is more stable in movement and has fewer falls down on complex terrains,showing better generalization performance.

关键词：多任务学习模仿学习强化学习地形影响因素 LeggedGym地形仿真器

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务强化学习的地形自适应模仿学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务强化学习的地形自适应模仿学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索