基于深度强化学习的柑橘采摘机械臂路径规划方法被引量：8

Path planning method for citrus picking manipulator based on deep reinforcement learning

作　　者：熊春源熊俊涛[1] 杨振刚[1] 胡文馨 XIONG Chunyuan;XIONG Juntao;YANG Zhengang;HU Wenxin(College of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China)

机构地区：[1]华南农业大学数学与信息学院,广东广州510642

出　　处：《华南农业大学学报》2023年第3期473-483,共11页Journal of South China Agricultural University

基　　金：国家自然科学基金(32071912);广州市基础研究计划(202102080337)。

摘　　要：[目的]为解决非结构化环境下采用深度强化学习进行采摘机械臂路径规划时存在的效率低、采摘路径规划成功率不佳的问题,提出了一种非结构化环境下基于深度强化学习(Deep reinforcement learning, DRL)和人工势场的柑橘采摘机械臂的路径规划方法。[方法]首先,通过强化学习方法进行采摘路径规划问题求解,设计了结合人工势场的强化学习方法;其次,引入长短期记忆(Longshort term memory,LSTM)结构对2种DRL算法的Actor网络和Critic网络进行改进;最后,在3种不同的非结构化柑橘果树环境训练DRL算法对采摘机械臂进行路径规划。[结果]仿真对比试验表明:结合人工势场的强化学习方法有效提高了采摘机械臂路径规划的成功率;引入LSTM结构的方法可使深度确定性策略梯度(Deep deterministic policy gradient,DDPG)算法的收敛速度提升57.25%,路径规划成功率提升23.00%;使软行为评判(Soft actor critic,SAC)算法的收敛速度提升53.73%,路径规划成功率提升9.00%;与传统算法RRT-connect(Rapidly exploring random trees connect)对比,引入LSTM结构的SAC算法使规划路径长度缩短了16.20%,路径规划成功率提升了9.67%。[结论]所提出的路径规划方法在路径规划长度、路径规划成功率方面存在一定优势,可为解决采摘机器人在非结构化环境下的路径规划问题提供参考。【Objective】In order to solve the problems of poor training efficiency and low success rate of picking path planning of manipulator using deep reinforcement learning(DRL),this study proposed a path planning method combined with DRL and artificial potential field for citrus picking manipulator in unstructured environments.【Method】Firstly,the picking path planning problem was solved by the DRL with artificial potential field method.Secondly,the longshort term memory(LSTM)structure was introduced to improve the Actor network and Critic network of two DRL algorithms.Finally,the DRL algorithms were trained in three different unstructured citrus growing environments to perform path planning for picking manipulator.【Result】The comparison of simulation experiments showed that the success rate of path planning was effectively improved by combining DRL with the artificial potential field method,the method with LSTM structure improved the convergence speed of the deep deterministic policy gradient(DDPG)algorithm by 57.25%and the success rate of path planning by 23.00%.Meanwhile,the method improved the convergence speed of the soft actor critic(SAC)algorithm by 53.73%and the path planning success rate by 9.00%.Compared with the traditional algorithm RRT-connect(Rapidly exploring random trees connect),the SAC algorithm with LSTM structure shortened the planned path length by 16.20%and improved the path planning success rate by 9.67%.【Conclusion】The proposed path planning method has certain advantages for path planning length and path planning success rate,which can provide references for solving path planning problems of picking robots in unstructured environments.

关键词：采摘机械臂柑橘路径规划深度强化学习非结构化环境 LSTM

分类号：S666[农业科学—果树学] S233.4[农业科学—园艺学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的柑橘采摘机械臂路径规划方法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的柑橘采摘机械臂路径规划方法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度强化学习的柑橘采摘机械臂路径规划方法被引量：8