检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:辛沅霞 华道阳 张犁[3] XIN Yuanxia;HUA Daoyang;ZHANG Li(School of Software Technology,Zhejiang University,Ningbo,Zhejiang 315103,China;School of Physics,Zhejiang University,Hangzhou 310027,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China)
机构地区:[1]浙江大学软件学院,浙江宁波315103 [2]浙江大学物理学院,杭州310027 [3]浙江大学计算机科学与技术学院,杭州310027
出 处:《计算机科学》2024年第5期179-192,共14页Computer Science
摘 要:目前深度强化学习算法在不同应用领域中已经取得诸多成果,然而在多智能体任务领域中,往往面临大规模的具有稀疏奖励的非稳态环境,低探索效率问题仍是一大挑战。由于智能规划能够根据任务的初始状态和目标状态快速制定出决策方案,该方案能够作为各智能体的初始策略,并为其探索过程提供有效指导,因此尝试将智能规划与多智能体强化学习进行结合求解,并且提出统一模型UniMP(a Unified model for Multi-agent Reinforcement Learning and AI Planning)。在此基础上,设计并建立相应的问题求解机制。首先,将多智能体强化学习任务转化为智能决策任务;其次,对其执行启发式搜索,以得到一组宏观目标,进而指导强化学习的训练,使得各智能体能够进行更加高效的探索。在多智能体即时战略对抗场景StarCraftⅡ的各地图以及RMAICS战车模拟对战环境下进行实验,结果表明累计奖励值和胜率均有显著提升,从而验证了统一模型的可行性、求解机制的有效性以及所提算法灵活应对强化学习环境突发情况的能力。At present,deep reinforcement learning algorithms have made a lot of achievements in various fields.However,in the field of multi-agent task,agents are often faced with non-stationary environment with larger state-action space and sparse rewards,low exploration efficiency is still a big challenge.Since AI planning can quickly obtain a solution according to the initial state and target state of the task,this solution can serve as the initial strategy of each agent and provide effective guidance for its exploration process,it is attempted to combine them and propose a unified model for multi-agent reinforcement learning and AI planning(UniMP).On the basis of it,the solution mechanism of the problem can be designed and implemented.By transforming the multi-agent reinforcement learning task into an intelligent decision task,and performing heuristic search on it,a set of macroscopic goals will be obtained,which can guide the training process of reinforcement learning,so that agents can conduct more efficient exploration.Finally,experiments are carried out under the various maps of multi-agent real-time strategy game StarCraftⅡand RoboMaster AI Challenge Simulator 2D.The results show that the cumulative reward value and win rate are significantly improved,which verifies the feasibility of UniMP,the effectiveness of solution mechanism and the ability of our algorithm to flexibly deal with the sudden situation of reinforcement learning environment.
关 键 词:多智能体强化学习 智能规划 启发式搜索 探索效率
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.127