检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵毓 管公顺[1] 郭继峰[1] 于晓强 颜鹏 ZHAO Yu;GUAN Gongshun;GUO Jifeng;YU Xiaoqiang;YAN Peng(School of Astronautics,Harbin Institute of Technology,Harbin 150001,China)
出 处:《航空学报》2021年第1期259-269,共11页Acta Aeronautica et Astronautica Sinica
基 金:国家自然科学基金(61973101);航空科学基金(20180577005)。
摘 要:针对某型六自由度(DOF)空间漂浮机械臂对运动目标捕捉场景,开展了基于深度强化学习的在线轨迹规划方法研究。首先给出了机械臂DH(Denavit-Hartenberg)模型,考虑组合体力学耦合特性建立了多刚体运动学和动力学模型。然后提出了一种改进深度确定性策略梯度算法,以各关节为决策智能体建立了多智能体自学习系统。而后建立了“线下集中学习,线上分布执行”的空间机械臂对匀速直线运动目标捕捉训练系统,构建以目标相对距离和总操作时间为参数的奖励函数。最后通过数学仿真验证,实现了机械臂对各向匀速运动目标的快速捕捉,平均完成耗时5.4s。与传统基于随机采样的规划算法对比,本文提出的自主决策运动规划方法求解速度和鲁棒性更优。An online self-learning trajectory planning method based on the deep reinforcement learning is studied for a six Degree-of-Freedom(DOF)space floating manipulator to capture moving objects.The DH(Denavit-Hartenberg)model of the manipulator is presented,and the kinematic and dynamic models of multi-rigid bodies established considering the mechanical coupling characteristics of the combination.An improved deep determination policy gradient algorithm is further proposed,and a multi-agent self-learning system established with each joint as a decision-making agent.Additionally,a training model of the space manipulator is built based on“offline centralized learning and online distributed execution”,constructing a reward function with the variables of the target relative distance and the total operation time.Simulation results show that the robot can capture the moving target rapidly with average time of 5.4s.Compared with the traditional planning algorithm based on random sampling,the autonomous decision-making motion planning method proposed in this paper exhibits better solution speed and robustness.
分 类 号:V447[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49