机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]苏州大学江苏省计算机信息处理技术重点实验室,江苏苏州215006
出 处:《计算机学报》2024年第7期1582-1594,共13页Chinese Journal of Computers
基 金:国家自然科学基金(62376179,62176175);新疆维吾尔自治区自然科学基金(2022D01A238);江苏高校优势学科建设工程资助项目资助.
摘 要:在多智能体协作领域,强化学习算法通过共享智能体的局部信息来实现智能体间的协作.但共享协作机制极易引发过度协作问题,导致智能体忽视自身局部观测信息,丧失策略多样性,最终陷入低效协作的困境.为了解决该问题,本文提出基于双视角建模的多智能体协作强化学习方法(Bi-View Modeling Collaborative Multi-Agent Reinforcement Learning,简称BVM-CMARL).该方法从局部和全局两个视角对智能体进行建模,分别用于产生多样性的策略和激励协作.在局部视角最大化局部变分与自身轨迹的互信息,激励智能体的策略多样性;同时在全局视角最大化全局变分与其他智能体动作的互信息,提高智能体协作水平.最后将局部变分训练出的局部Q值与全局变分训练出的全局Q值合并,避免低效协作.将BVM-CMARL算法应用于星际争霸多智能体挑战赛(StarCraft Multi-Agent Challenge,SMAC)中的等级觅食(Level-Based Foraging,LBF)和走廊(Hallway)等环境,与QMIX、QPLEX、RODE、EOI和MAVEN等5种目前优秀的强化学习算法相比,BVM-CMARL算法具有更好的稳定性和性能表现,在SMAC上的平均胜率为82.81%,比次优算法RODE高13.42%.通过设计模型变体,在消融实验中证明了双视角建模对BVM-CMARL的必要性.In recent years,there have been notable advancements in artificial intelligence technology,solidifying its crucial role in a wide array of real-world applications.Among the branches of artificial intelligence,reinforcement learning shines as a key discipline adept at tackling complex sequential decision-making challenges and playing a vital role in tasks related to control.By harnessing the progress made in neural network theory and computational power,deep reinforcement learning has revolutionized conventional reinforcement learning algorithms,smoothly integrating deep learning techniques into the decision-making frameworks of agents.For instance,Deep Q-Learning(DQN)is a prime illustration of this progress,employing a convolutional neural network to analyze visual inputs from Atari 2600 games and subsequently adjusting the policy of the reinforcement learning algorithm.Complex deep reinforcement learning tasks often entail multiple agents and are consequently formulated as multi-agent reinforcement learning,a framework that has demonstrated remarkable success across various domains,such as traffic control,sensor networks,gaming AI.In multi-agent reinforcement learning,agents can learn to collaborate through the Centralized Training with Decentralized Execution(CTDE)mechanism.In CTDE mechanism,reinforcement learning algorithms are able to realize cooperative behavior between agents through the sharing of local information between them as part of the cooperation process.As a result of this shared cooperation mechanism,complex multi-agent tasks can be solved in many fields,but the problem that arises at the same time is that excessive cooperation between the agents can lead to a conflict.There is a consequence of this in that agents begin to overlook the use of their current local observation information in cooperative efforts,losing the diversity of policy options,and eventually becoming inefficiently collaborating.Aiming at this problem,we propose a Bi-View Modeling Collaborative Multi-Agent Reinforcement Learnin
关 键 词:深度强化学习 多智能体系统 多智能体协作 协作建模 对比学习
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...