基于双视角建模的多智能体协作强化学习方法

Multi-Agent Collaborative Reinforcement Learning Method Based on Bi-View Modeling

作　　者：刘全[1,2] 施眉龙黄志刚张立华 LIU Quan;SHI Mei-Long;HUANG Zhi-Gang;ZHANG Li-Hua(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006;Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]苏州大学江苏省计算机信息处理技术重点实验室,江苏苏州215006

出　　处：《计算机学报》2024年第7期1582-1594,共13页Chinese Journal of Computers

基　　金：国家自然科学基金(62376179,62176175);新疆维吾尔自治区自然科学基金(2022D01A238);江苏高校优势学科建设工程资助项目资助.

摘　　要：在多智能体协作领域,强化学习算法通过共享智能体的局部信息来实现智能体间的协作.但共享协作机制极易引发过度协作问题,导致智能体忽视自身局部观测信息,丧失策略多样性,最终陷入低效协作的困境.为了解决该问题,本文提出基于双视角建模的多智能体协作强化学习方法(Bi-View Modeling Collaborative Multi-Agent Reinforcement Learning,简称BVM-CMARL).该方法从局部和全局两个视角对智能体进行建模,分别用于产生多样性的策略和激励协作.在局部视角最大化局部变分与自身轨迹的互信息,激励智能体的策略多样性;同时在全局视角最大化全局变分与其他智能体动作的互信息,提高智能体协作水平.最后将局部变分训练出的局部Q值与全局变分训练出的全局Q值合并,避免低效协作.将BVM-CMARL算法应用于星际争霸多智能体挑战赛(StarCraft Multi-Agent Challenge,SMAC)中的等级觅食(Level-Based Foraging,LBF)和走廊(Hallway)等环境,与QMIX、QPLEX、RODE、EOI和MAVEN等5种目前优秀的强化学习算法相比,BVM-CMARL算法具有更好的稳定性和性能表现,在SMAC上的平均胜率为82.81%,比次优算法RODE高13.42%.通过设计模型变体,在消融实验中证明了双视角建模对BVM-CMARL的必要性.In recent years,there have been notable advancements in artificial intelligence technology,solidifying its crucial role in a wide array of real-world applications.Among the branches of artificial intelligence,reinforcement learning shines as a key discipline adept at tackling complex sequential decision-making challenges and playing a vital role in tasks related to control.By harnessing the progress made in neural network theory and computational power,deep reinforcement learning has revolutionized conventional reinforcement learning algorithms,smoothly integrating deep learning techniques into the decision-making frameworks of agents.For instance,Deep Q-Learning(DQN)is a prime illustration of this progress,employing a convolutional neural network to analyze visual inputs from Atari 2600 games and subsequently adjusting the policy of the reinforcement learning algorithm.Complex deep reinforcement learning tasks often entail multiple agents and are consequently formulated as multi-agent reinforcement learning,a framework that has demonstrated remarkable success across various domains,such as traffic control,sensor networks,gaming AI.In multi-agent reinforcement learning,agents can learn to collaborate through the Centralized Training with Decentralized Execution(CTDE)mechanism.In CTDE mechanism,reinforcement learning algorithms are able to realize cooperative behavior between agents through the sharing of local information between them as part of the cooperation process.As a result of this shared cooperation mechanism,complex multi-agent tasks can be solved in many fields,but the problem that arises at the same time is that excessive cooperation between the agents can lead to a conflict.There is a consequence of this in that agents begin to overlook the use of their current local observation information in cooperative efforts,losing the diversity of policy options,and eventually becoming inefficiently collaborating.Aiming at this problem,we propose a Bi-View Modeling Collaborative Multi-Agent Reinforcement Learnin

关键词：深度强化学习多智能体系统多智能体协作协作建模对比学习

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双视角建模的多智能体协作强化学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双视角建模的多智能体协作强化学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索