检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李梦甜 向颖岑 谢志峰[1,2] 马利庄 Li Mengtian;Xiang Yingcen;Xie Zhifeng;Ma Lizhuang(Dept.of Film&Television Engineering,Shanghai University,Shanghai 200072,China;Shanghai Film Special Effects Engineering Technology Research Center,Shanghai University,Shanghai 200072,China;Dept.of Computer Science&Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)
机构地区:[1]上海大学影视工程系,上海200072 [2]上海大学上海电影特效工程技术研究中心,上海200072 [3]上海交通大学计算机科学与工程系,上海200240
出 处:《计算机应用研究》2024年第8期2474-2480,共7页Application Research of Computers
摘 要:现有的基于通信学习的多智能体路径规划(multi-agent path finding,MAPF)方法大多可扩展性较差或者聚合了过多冗余信息,导致通信低效。为解决以上问题,提出干扰者鉴别通信机制(DIC),通过判断视场(field of view,FOV)中央智能体的决策是否因邻居的存在而改变来学习排除非干扰者的简洁通信,成功过滤了冗余信息。同时进一步实例化DIC,开发了一种新的高度可扩展的分布式MAPF求解器,基于强化和模仿学习的干扰者鉴别通信算法(disruptor identifiable communication based on reinforcement and imitation learning algorithm,DICRIA)。首先,由干扰者鉴别器配合DICRIA的策略输出层识别出干扰者;其次,在两轮通信中分别完成对干扰者与通信意愿发送方的信息更新;最后,DICRIA根据各模块的编码结果输出最终决策。实验结果表明,DICRIA的性能几乎在所有环境设置下都优于其他同类求解器,且相比基线求解器,成功率平均提高了5.2%。尤其在大尺寸地图的密集型问题实例下,DICRIA的成功率相比基线求解器甚至提高了44.5%。Most of the existing MAPF methods based on communication learning have poor scalability or aggregate too much redundant information,resulting in inefficient communication.To solve these problems,this paper proposed disruptor identifiable communication(DIC),which learned concise communication excluding non-disruptors by judging whether the agent in the center of the field of view would change its decision-making due to the presence of neighbors,and successfully filtered out redundant information.At the same time,this paper further instantiated DIC and developed a new highly scalable distributed MAPF solver:disruptor identifiable communication based on reinforcement and imitation learning algorithm(DICRIA).Firstly,the disruptor discriminator and the policy output layer of DICRIA identified the disruptor.Secondly,the algorithm updated the information of the disruptor and the communication wish sender in two rounds of communication respectively.Finally,DICRIA output the final policy according to the coding results of each module.Experimental results show that DICRIA’s performance is better than other similar solvers in almost all environment settings,and the algorithm increases the success rate by 5.2%on average compared to the baseline solver.Especially in dense problem instances with large-size maps,the algorithm even increases the success rate of DICRIA by 44.5%compared to the baseline solver.
关 键 词:多智能体 路径规划 强化学习 模仿学习 干扰者鉴别通信
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49