基于优势竞争网络的转运机器人路径规划  被引量:5

Transport robot path planning based on an advantage dueling double deep Q-network

在线阅读下载全文

作  者:何启嘉 王启明[1,3] 李佳璇 王正佳 王通 HE Qijia;WANG Qiming;LI Jiaxuan;WANG Zhengjia;WANG Tong(National Astronomical Observatory,Chinese Academy of Sciences,Beijing 100101,China;University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of FAST,Chinese Academy of Sciences,Beijing 100101,China;Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;Institutes for Robotics and Intelligent Manufacturing,Chinese Academy of Sciences,Shenyang 110169,China;School of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China)

机构地区:[1]中国科学院国家天文台,北京100101 [2]中国科学院大学,北京100049 [3]中国科学院FAST重点实验室,北京100101 [4]中国科学院沈阳自动化研究所,沈阳110016 [5]中国科学院机器人与智能制造创新研究院,沈阳110169 [6]南京工业大学计算机科学与技术学院,南京211816

出  处:《清华大学学报(自然科学版)》2022年第11期1751-1757,共7页Journal of Tsinghua University(Science and Technology)

基  金:国家重点研发计划项目(2019YFB1312702)。

摘  要:该文提出了一种基于深度强化学习的优势竞争网络(advantage dueling double deep Q-network,AD3QN)算法作为500m口径球面射电望远镜(five-hundred-meter aperture spherical radio telescope,FAST)促动器自动化维护车间转运机器人的路径规划方法。通过预先学习竞争网络中的状态价值层,使状态价值参数根据环境状态进行初始化,减少了首次接触目标点所需要的步数;通过改进竞争网络中的贪婪搜索算法,使环境探索与利用的转变更为合理;通过改进动作选择策略,使机器人路径规划不易陷入局部极小值,进一步加快了算法收敛的速度。AD3QN算法具有动态规划能力强、实时性好、柔性高、鲁棒性强和准确率高等优点。对促动器自动化维护车间进行建模并测试网络改进前后的路径规划能力,仿真结果表明:采用AD3QN算法在首次找到目标点用时方面比一般竞争网络快176%。该研究有望提高FAST促动器的维护效率,进而减少对FAST观测时间的挤占。An advantage dueling double deep Q-network(AD3QN)algorithm using deep reinforcement learning was developed for the transport robot path planning for the five-hundred-meter aperture spherical radio telescope(FAST)actuator automatic maintenance workshop.The dueling network state value layer is learned in advance so that the state value parameters are initialized according to the environmental state to reduce the steps required to reach the target point the first time.An improved greedy network search algorithm simplifies the environmental exploration and utilization.The action selection strategy avoids local minima in the robot path and improves the algorithm convergence speed.AD3QN provides good dynamic planning and real-time performance and is flexible,robust and accurate.Modeling the actuator actuator maintenance workshop and testing the path planning capability of the network before and after the improvement,simulations show that the time to find the target point the first time is 176%faster with AD3QN than with a general dueling network.This research improves the actuator maintenance efficiency which provides extended observation times.

关 键 词:FAST促动器 深度强化学习 竞争网络 路径规划 

分 类 号:TP241[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象