DDPG算法下机械臂避障轨迹序列模式挖掘仿真  

Simulation of Sequential Pattern Mining for Obstacle Avoidance Trajectory of Robotic Arms under DDPG Algorithm

在线阅读下载全文

作  者:李路可 杨杰 LI Lu-ke;YANG Jie(School of Engineering,Zhengzhou Technology and Business University,Zhengzhou Henan 451400,China;School of Mechanical Engineering,North China University of Water Resources and Electric Power,Zhengzhou Henan 450045,China)

机构地区:[1]郑州工商学院工学院,河南郑州451400 [2]华北水利水电大学机械学院,河南郑州450045

出  处:《计算机仿真》2024年第11期448-452,共5页Computer Simulation

摘  要:机械臂在运动过程中,会产生大量的轨迹数据,由于传感器误差、环境不稳定性和其它因素的影响,采集到的机械臂轨迹数据包含噪声和不确定性,以上干扰会对模式挖掘的精度造成影响,使得成功样本提取变得困难。为解决上述问题,提出基于DDPG的机械臂避障轨迹序列模式挖掘方法。通过对机械臂避障问题分析,获取避障轨迹序列模式挖掘的根本目标,选择深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)作为挖掘机械臂避障轨迹序列模式的基础算法,并为其设计奖励函数以提升算法收敛性,将SumTree引入DDPG的经验回放之中,建立加权采样DDPG,实现机械臂最优避障轨迹序列模式挖掘。实验结果表明,所提方法的挖掘成功率在96%以上、挖掘时间在2ms内,且有效提高累积奖励均值。During the motion of a manipulator,a large amount of trajectory data is generated.Due to sensor error,environmental instability and other factors,the collected trajectory data may contain noise and uncertainty.These in-terferences can affect the accuracy of pattern mining and make successful sample extraction difficult.Therefore,based on DDPG,a method of mining the sequence pattern of obstacle avoidance trajectory of the manipulator was proposed.After analyzing the problem of obstacle avoidance,the fundamental goal of mining obstacle avoidance trajectory se-quence mode was obtained.Then,we used Deep Deterministic Policy Gradient(DDPG)as the basic algorithm for min-ing obstacle avoidance trajectory sequence mode.Meanwhile,we designed a reward function to improve the conver-gence of the algorithm.Moreover,we introduced Sum Tree into the experience replay of DDPG to establish a weighted sampling DDPG,thus realizing the optimal pattern mining of the manipulator.Experimental results show that the suc-cess rate of the proposed method is over 96%,and the mining time is within 2ms.Meanwhile,the mean value of cu-mulative rewards is effectively improved.

关 键 词:深度确定性策略梯度 机械臂避障 轨迹序列模式 奖励函数 

分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象