基于启发式深度Q学习的多机器人任务分配算法被引量：16

Multi-robot task allocation algorithm b Multirobot task allocation algorithm based on heuristically accelerated deep Q network

作　　者：张子迎陈云飞王宇华冯光升 ZHANG Ziying;CHEN Yunfei;WANG Yuhua;FENG Guangsheng(College of Computer Science,JIAYING University, Meizhou 514015,China;College of Computer Science and Technology,Harbin Engineering University, Harbin 150001,China)

机构地区：[1]嘉应学院计算机学院,广东梅州514015 [2]哈尔滨工程大学计算机与科学技术学院,黑龙江哈尔滨150001

出　　处：《哈尔滨工程大学学报》2022年第6期857-864,共8页Journal of Harbin Engineering University

基　　金：国家自然科学基金项目(61502118).

摘　　要：针对多机器人任务分配方法在环境复杂性增加时出现的维度灾难问题,本文提出了一种基于启发式深度Q学习的多机器人多任务分配算法。采用神经网络代替传统强化学习中的Q值,避免了强化学习在高维度空间下的状态-动作空间的局限性问题;将轨迹池引入深度Q学习算法中启发动作的选择策略,提高了算法的收敛速度;在动作选择决策之中引入动态探索因子,保证算法对环境中的未知空间的充分探索,进而提高算法的学习效率。通过实验证明:基于启发式深度Q学习的任务分配算法成功缓解了复杂环境下多机器人多任务分配的维度灾难问题,通过实验对比,证明基于启发式深度Q学习的任务分配算法在收敛速度和任务分配结果方面存在明显的提升。To address the dimensional disaster problem of the multirobot task allocation method when the environment complexity intensifies,this paper proposes a multirobot multitask allocation algorithm on the basis of the heuristically accelerated deep Q network(HADQN).First,the use of a neural network instead of the Q value in traditional reinforcement learning avoids the limitation of the state-action space of reinforcement learning in the high-dimensional space.Second,the trajectory pool is introduced into the DQN algorithm to inspire action selection strategies,which improves the algorithm′s convergence speed.Finally,a dynamic exploration factor is introduced into the action selection decision to ensure that the algorithm fully explores the unknown space in the environment,thereby improving the algorithm′s learning efficiency.Experiments show that the HADQN-based task allocation algorithm alleviates the dimensional disaster problem of multirobot multitask allocation in complex environments.The experimental comparison reveals that the HADQN-based task allocation algorithm significantly improves the convergence speed and task allocation results.

关键词：任务分配神经网络强化学习 Q值高纬度启发式深度Q学习维度灾难动态探索

分类号：TP242[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于启发式深度Q学习的多机器人任务分配算法被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于启发式深度Q学习的多机器人任务分配算法 被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于启发式深度Q学习的多机器人任务分配算法被引量：16