面向多智能体协作的注意力意图与交流学习方法被引量：4

Attentional Intention and Communication for Multi-agent Learning

作　　者：俞文武杨晓亚李海昌[1] 王瑞[1] 胡晓惠[1] YU Wen-Wu;YANG Xiao-Ya;LI Hai-Chang;WANG Rui;HU Xiao-Hui(Science and Technology on Integrated Information System Laboratory,Institute of Software,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区：[1]中国科学院软件研究所天基综合信息系统重点实验室,北京100190 [2]中国科学院大学,北京100049

出　　处：《自动化学报》2023年第11期2311-2325,共15页Acta Automatica Sinica

基　　金：国家重点研发计划(2019YFB1405100);国家自然科学基金(61802380,61802016)资助。

摘　　要：对于部分可观测环境下的多智能体交流协作任务,现有研究大多只利用了当前时刻的网络隐藏层信息,限制了信息的来源.研究如何使用团队奖励训练一组独立的策略以及如何提升独立策略的协同表现,提出多智能体注意力意图交流算法(Multi-agent attentional intention and communication,MAAIC),增加了意图信息模块来扩大交流信息的来源,并且改善了交流模式.将智能体历史上表现最优的网络作为意图网络,且从中提取策略意图信息,按时间顺序保留成一个向量,最后结合注意力机制推断出更为有效的交流信息.在星际争霸环境中,通过实验对比分析,验证了该算法的有效性.For multi-agent communication and cooperation tasks in partially observable environments,most of the existing studies only use the information of the hidden layer of the network at the current time,which limits the source of information.This paper studies how to use team rewards to train a set of independent policies and how to improve the collaborative performance of independent policies.A multi-agent attentional intention communication(MAAIC)algorithm is proposed to improve the communication mode,and an intention information module is added to expand the source of communication information.The network with the best performance in the history of an agent is taken as the intention network,from which the policy intention information is extracted.The historical intention information of the agent that performs best at all times is retained as a vector in chronological order,and combined with the attention mechanism and current observation history information to extract more effective information as input for decision-making.The effectiveness of the algorithm is verified by experimental comparison and analysis on StarCraft multi-agent challenge.

关键词：多智能体强化学习意图交流注意力机制

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多智能体协作的注意力意图与交流学习方法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多智能体协作的注意力意图与交流学习方法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向多智能体协作的注意力意图与交流学习方法被引量：4