基于智能规划的多智能体强化学习算法被引量：1

Multi-agent Reinforcement Learning Algorithm Based on AI Planning

作　　者：辛沅霞华道阳张犁[3] XIN Yuanxia;HUA Daoyang;ZHANG Li(School of Software Technology,Zhejiang University,Ningbo,Zhejiang 315103,China;School of Physics,Zhejiang University,Hangzhou 310027,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China)

机构地区：[1]浙江大学软件学院,浙江宁波315103 [2]浙江大学物理学院,杭州310027 [3]浙江大学计算机科学与技术学院,杭州310027

出　　处：《计算机科学》2024年第5期179-192,共14页Computer Science

摘　　要：目前深度强化学习算法在不同应用领域中已经取得诸多成果,然而在多智能体任务领域中,往往面临大规模的具有稀疏奖励的非稳态环境,低探索效率问题仍是一大挑战。由于智能规划能够根据任务的初始状态和目标状态快速制定出决策方案,该方案能够作为各智能体的初始策略,并为其探索过程提供有效指导,因此尝试将智能规划与多智能体强化学习进行结合求解,并且提出统一模型UniMP(a Unified model for Multi-agent Reinforcement Learning and AI Planning)。在此基础上,设计并建立相应的问题求解机制。首先,将多智能体强化学习任务转化为智能决策任务;其次,对其执行启发式搜索,以得到一组宏观目标,进而指导强化学习的训练,使得各智能体能够进行更加高效的探索。在多智能体即时战略对抗场景StarCraftⅡ的各地图以及RMAICS战车模拟对战环境下进行实验,结果表明累计奖励值和胜率均有显著提升,从而验证了统一模型的可行性、求解机制的有效性以及所提算法灵活应对强化学习环境突发情况的能力。At present,deep reinforcement learning algorithms have made a lot of achievements in various fields.However,in the field of multi-agent task,agents are often faced with non-stationary environment with larger state-action space and sparse rewards,low exploration efficiency is still a big challenge.Since AI planning can quickly obtain a solution according to the initial state and target state of the task,this solution can serve as the initial strategy of each agent and provide effective guidance for its exploration process,it is attempted to combine them and propose a unified model for multi-agent reinforcement learning and AI planning(UniMP).On the basis of it,the solution mechanism of the problem can be designed and implemented.By transforming the multi-agent reinforcement learning task into an intelligent decision task,and performing heuristic search on it,a set of macroscopic goals will be obtained,which can guide the training process of reinforcement learning,so that agents can conduct more efficient exploration.Finally,experiments are carried out under the various maps of multi-agent real-time strategy game StarCraftⅡand RoboMaster AI Challenge Simulator 2D.The results show that the cumulative reward value and win rate are significantly improved,which verifies the feasibility of UniMP,the effectiveness of solution mechanism and the ability of our algorithm to flexibly deal with the sudden situation of reinforcement learning environment.

关键词：多智能体强化学习智能规划启发式搜索探索效率

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于智能规划的多智能体强化学习算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于智能规划的多智能体强化学习算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于智能规划的多智能体强化学习算法被引量：1