检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王翰墨 郑世杰 徐若楠 郭斌[1] 吴磊 WANG Hanmo;ZHENG Shijie;XU Ruonan;GUO Bin;WU Lei(School of Computer Science,Northwestern Polytechnical University,Xi'an 710072,China)
出 处:《计算机科学》2023年第6期266-273,共8页Computer Science
基 金:国家杰出青年科学基金(62025205);国家自然科学基金(62032020,62102317)。
摘 要:模块化机器人是由一定数量、具有独立功能的标准模块组合而成的。自重构问题是目前模块化机器人研究领域的热点与难点。传统的图论算法或者搜索算法在模块数量较多、复杂度较大时,无法在多项式时间内寻找到通用最优解。文中从群智能体深度强化学习的角度出发,将每个同构模块视为具有学习与感知能力的单智能体,提出了基于QMIX的模块化机器人自重构算法。针对该算法,设计了一种新型的奖励函数,并在限制智能体的动作空间的基础上,实现了智能体并行化移动,在一定程度上解决了多智能体之间的协调合作问题,从而实现了从初始构型向目标构型的转变。实验以9个模块为例,对比了该算法与基于A*的传统搜索算法在成功率以及平均步数上的差异。实验结果表明,在时间步数限制合理的情况下,基于QMIX的模块化机器人自重构算法的成功率能够达到95%以上,两种算法的平均步数大约在12步左右,QMIX自重构算法能够逼近传统算法的效果。Modular robots are composed of a certain number of standard modules with independent functions.At present,self reconfiguration is a hot and difficult problem in the field of modular robot research.For complex problems,the traditional graph theory algorithm or search algorithm cannot find its optimal solution in polynomial time,and the complexity increases exponentially with the increase of the number of modules.From the perspective of deep reinforcement learning of swarm agents,the research regards each isomorphic module as a single agent with learning and perception ability,and proposes a modular robot self reconfiguration algorithm based on QMIX.For this algorithm,a new type of reward function is designed and the parallel movement of the agent on the basis of limiting the action space of the agents is realized,which solves the problem of coordination and cooperation between multiple agents to a certain extent,thereby realizing the transition from the initial configuration to the target configuration.In addition,in experiments,9 modules are taken as examples to compare the success rate and average steps between this algorithm and the traditional search algorithm based on A*.Experimental results show that when the time step limit is reasonable,the success rate of the modular robot self-reconfiguration algorithm based on QMIX can reach more than 95%,and the average number of steps of the two algorithms is about 12 steps.The QMIX self-reconfiguration algorithm can approach the effect of the traditional algorithm.
关 键 词:模块化机器人 自重构 群智能体协作 深度强化学习 构型空间与运动空间
分 类 号:TP242.6[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.210.36