基于事后筛选经验回放的机器人深度强化学习跟踪控制  

Deep reinforcement learning tracking control for robotic manipulator based on selective hindsight experience replay

在线阅读下载全文

作  者:易佳豪 王福杰 胡锦涛 李醒 罗俊轩 Yi Jiahao;Wang Fujie;Hu Jintao;Li Xing;Luo Junxuan(Dept.of Computer Science,Dongguan University of Technology,Dongguan Guangdong 523000,China)

机构地区:[1]东莞理工学院计算机学院,广东东莞523000

出  处:《计算机应用研究》2025年第3期834-839,共6页Application Research of Computers

基  金:国家自然科学基金资助项目(62203116,62273095);广东省基础与应用基础研究面上项目(2024A1515010222);辽宁省自然科学基金资助项目(2022-KF-21-06);广东省教育厅特色创新项目(2022KTSCX138);东莞市社会发展科技项目重点项目(20231800935882);松山湖科技特派员资助项目(20234430-01KCJ-G)。

摘  要:针对机械臂轨迹跟踪问题,提出了一种结合事后筛选经验回放(selective hindsight experience replay,SHER)的深度强化学习(deep reinforcement learning,DRL)控制方法。此算法将SHER与深度确定性策略(deep deterministic policy gradient,DDPG)结合进行机械臂的轨迹跟踪控制。SHER算法将智能体探索的经验进行随机抽取,然后筛选有用经验修改奖励函数,通过提高对正确动作的奖励评分加强对智能体正确动作的正反馈强度从而提高智能体探索效率。为了验证方法的有效性,通过欧拉-拉格朗日建模二自由度机械臂并在具有干扰的复杂环境下进行仿真实验对比。实验结果表明,所提算法在机械臂轨迹跟踪任务中收敛速度以及收敛稳定性与对比算法相比最优,并且训练出来的模型与对比算法相比在轨迹跟踪任务中表现最好,验证了算法的有效性。For the robotic arm trajectory tracking problem,this paper proposed a deep reinforcement learning(DRL)control method combined with selective hindsight experience replay(SHER).This paper combined SHER with deep deterministic policy gradient(DDPG)for trajectory tracking control of the robotic arm.The SHER algorithm randomly extracted the experience of the exploration and then filtered the useful experience to modify the reward function.The SHER reinforced the strength of positive feedback on the correct actions of agent by increasing the reward score for the proper action,which improved the exploration efficiency and enabled faster learning of effective strategies.In order to verify the validity of the method,it modeled a two-degree-of-freedom robotic arm by Eulerian-Lagrangian and compared the simulation experiments in a complex environment with disturbances.The experimental results show that the proposed algorithm has the best convergence speed and convergence stability in the robotic arm trajectory tracking task compared with the comparison algorithm.The trained model has the best performance in the trajectory tracking task compared with the comparison algorithm,which validates the effectiveness of the algorithm.

关 键 词:事后筛选经验回放 深度确定性策略 深度强化学习 轨迹跟踪 机械臂 经验池优化 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象