基于强化和模仿学习的多智能体寻路干扰者鉴别通信机制

Disruptor identifiable communication based on reinforcement and imitation learning for multi-agent path finding

作　　者：李梦甜向颖岑谢志峰[1,2] 马利庄 Li Mengtian;Xiang Yingcen;Xie Zhifeng;Ma Lizhuang(Dept.of Film&Television Engineering,Shanghai University,Shanghai 200072,China;Shanghai Film Special Effects Engineering Technology Research Center,Shanghai University,Shanghai 200072,China;Dept.of Computer Science&Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)

机构地区：[1]上海大学影视工程系,上海200072 [2]上海大学上海电影特效工程技术研究中心,上海200072 [3]上海交通大学计算机科学与工程系,上海200240

出　　处：《计算机应用研究》2024年第8期2474-2480,共7页Application Research of Computers

摘　　要：现有的基于通信学习的多智能体路径规划(multi-agent path finding,MAPF)方法大多可扩展性较差或者聚合了过多冗余信息,导致通信低效。为解决以上问题,提出干扰者鉴别通信机制(DIC),通过判断视场(field of view,FOV)中央智能体的决策是否因邻居的存在而改变来学习排除非干扰者的简洁通信,成功过滤了冗余信息。同时进一步实例化DIC,开发了一种新的高度可扩展的分布式MAPF求解器,基于强化和模仿学习的干扰者鉴别通信算法(disruptor identifiable communication based on reinforcement and imitation learning algorithm,DICRIA)。首先,由干扰者鉴别器配合DICRIA的策略输出层识别出干扰者;其次,在两轮通信中分别完成对干扰者与通信意愿发送方的信息更新;最后,DICRIA根据各模块的编码结果输出最终决策。实验结果表明,DICRIA的性能几乎在所有环境设置下都优于其他同类求解器,且相比基线求解器,成功率平均提高了5.2%。尤其在大尺寸地图的密集型问题实例下,DICRIA的成功率相比基线求解器甚至提高了44.5%。Most of the existing MAPF methods based on communication learning have poor scalability or aggregate too much redundant information,resulting in inefficient communication.To solve these problems,this paper proposed disruptor identifiable communication(DIC),which learned concise communication excluding non-disruptors by judging whether the agent in the center of the field of view would change its decision-making due to the presence of neighbors,and successfully filtered out redundant information.At the same time,this paper further instantiated DIC and developed a new highly scalable distributed MAPF solver:disruptor identifiable communication based on reinforcement and imitation learning algorithm(DICRIA).Firstly,the disruptor discriminator and the policy output layer of DICRIA identified the disruptor.Secondly,the algorithm updated the information of the disruptor and the communication wish sender in two rounds of communication respectively.Finally,DICRIA output the final policy according to the coding results of each module.Experimental results show that DICRIA’s performance is better than other similar solvers in almost all environment settings,and the algorithm increases the success rate by 5.2%on average compared to the baseline solver.Especially in dense problem instances with large-size maps,the algorithm even increases the success rate of DICRIA by 44.5%compared to the baseline solver.

关键词：多智能体路径规划强化学习模仿学习干扰者鉴别通信

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化和模仿学习的多智能体寻路干扰者鉴别通信机制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化和模仿学习的多智能体寻路干扰者鉴别通信机制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索