基于改进强化学习的模块化自重构机器人编队被引量：3

Formation of Modular Self-reconfigurable Robots Based on Improved Reinforcement Learning

作　　者：李伟科岳洪伟王宏民杨勇赵敏邓辅秦 LI Wei-ke;YUE Hong-wei;WANG Hong-min;YANG Yong;ZHAO Min;DENG Fu-qin(Department of Intelligent Manufacturing,Wuyi University,Jiangmen,Guangdong 529020,China;Shenzhen Institute of Artificial Intelligence and Robotics for Society,Shenzhen,Guangdong 518116,China;3irobotix,Shenzhen,Guangdong 518000,China;CETC Potevio Scionce&Technology Co.,Ltd.,Guangzhou,Guangdong 510310,China)

机构地区：[1]五邑大学智能制造学部,广东江门529020 [2]深圳市人工智能与机器人研究院,广东深圳518116 [3]深圳市杉川机器人有限公司,广东深圳518006 [4]中电科普天科技股份有限公司研发中心,广东广州510310

出　　处：《计算技术与自动化》2022年第3期6-13,共8页Computing Technology and Automation

基　　金：国家重点研发计划资助项目(2020YFB1313300);广东省联合培养研究生示范基地项目(503170060259);广东省基础与应用基础研究基金资助项目(2019A1515111119);深圳市科技计划资助项目(KQTD2016113010470345)。

摘　　要：针对传统强化学习算法在训练初期缺乏对周围环境的先验知识,模块化自重构机器人会随机选择动作,导致迭代次数浪费和算法收敛速度缓慢的问题,提出一种两阶段强化学习算法。在第一阶段,利用基于群体和知识共享的Q-learning训练机器人前往网格地图的中心点,以获得一个最优共享Q表。在这个阶段中,为了减少迭代次数,提高算法的收敛速度,引入了曼哈顿距离作为奖赏值,以引导机器人向有利于中心点方向移动,减小稀疏奖励的影响。在第二阶段,机器人根据这个最优共享Q表和当前所处的位置,找到前往指定目标点的最优路径,形成指定的队形。实验结果表明,在50×50的网格地图中,与对比算法相比,该算法成功训练机器人到达指定目标点,减少了将近50%的总探索步数。此外,当机器人进行队形转换时,编队运行时间减少了近5倍。Based on the traditional reinforcement learning algorithm, due to a lack of prior knowledge of the surrounding environment, the modular self-reconfigurable robot will randomly select actions, resulting in a waste of iterations and slow convergence. A two-stage reinforcement learning algorithm is proposed. In the first stage, based on knowledge sharing among robots, the improved Q-learning algorithm is proposed to speed up the training process and obtain the optimal Q table. In this stage, to reduce the number of iterations and improve the convergence speed of the algorithm, Manhattan distance is introduced as the reward value to guide the robot to move in the direction favorable to the center point and reduce the influence of sparse reward. In the second stage, according to the resulting Q table and the current position, each robot finds the optimal path to the specified target point and forms the specified formation. The experimental results show that in a 50×50 grid map, compared with the comparison algorithm, the algorithm successfully trains the robots to reach the specified target points, reducing the total number of exploration steps by nearly 50%. In addition, when the robots perform formation switching, the formation runtime is reduced by nearly five times.

关键词：模块化自重构机器人强化学习多机器人编队

分类号：TP39[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进强化学习的模块化自重构机器人编队被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进强化学习的模块化自重构机器人编队 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于改进强化学习的模块化自重构机器人编队被引量：3