基于多智能体强化学习的空间机械臂轨迹规划被引量：19

Trajectory planning of space manipulator based on multi-agent reinforcement learning

作　　者：赵毓管公顺[1] 郭继峰[1] 于晓强颜鹏 ZHAO Yu;GUAN Gongshun;GUO Jifeng;YU Xiaoqiang;YAN Peng(School of Astronautics,Harbin Institute of Technology,Harbin 150001,China)

机构地区：[1]哈尔滨工业大学航天学院,哈尔滨150001

出　　处：《航空学报》2021年第1期259-269,共11页Acta Aeronautica et Astronautica Sinica

基　　金：国家自然科学基金(61973101);航空科学基金(20180577005)。

摘　　要：针对某型六自由度(DOF)空间漂浮机械臂对运动目标捕捉场景,开展了基于深度强化学习的在线轨迹规划方法研究。首先给出了机械臂DH(Denavit-Hartenberg)模型,考虑组合体力学耦合特性建立了多刚体运动学和动力学模型。然后提出了一种改进深度确定性策略梯度算法,以各关节为决策智能体建立了多智能体自学习系统。而后建立了“线下集中学习,线上分布执行”的空间机械臂对匀速直线运动目标捕捉训练系统,构建以目标相对距离和总操作时间为参数的奖励函数。最后通过数学仿真验证,实现了机械臂对各向匀速运动目标的快速捕捉,平均完成耗时5.4s。与传统基于随机采样的规划算法对比,本文提出的自主决策运动规划方法求解速度和鲁棒性更优。An online self-learning trajectory planning method based on the deep reinforcement learning is studied for a six Degree-of-Freedom(DOF)space floating manipulator to capture moving objects.The DH(Denavit-Hartenberg)model of the manipulator is presented,and the kinematic and dynamic models of multi-rigid bodies established considering the mechanical coupling characteristics of the combination.An improved deep determination policy gradient algorithm is further proposed,and a multi-agent self-learning system established with each joint as a decision-making agent.Additionally,a training model of the space manipulator is built based on“offline centralized learning and online distributed execution”,constructing a reward function with the variables of the target relative distance and the total operation time.Simulation results show that the robot can capture the moving target rapidly with average time of 5.4s.Compared with the traditional planning algorithm based on random sampling,the autonomous decision-making motion planning method proposed in this paper exhibits better solution speed and robustness.

关键词：机械臂轨迹规划多智能体策略梯度在轨捕捉

分类号：V447[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体强化学习的空间机械臂轨迹规划被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体强化学习的空间机械臂轨迹规划 被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多智能体强化学习的空间机械臂轨迹规划被引量：19