检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:俞文武 杨晓亚 李海昌[1] 王瑞[1] 胡晓惠[1] YU Wen-Wu;YANG Xiao-Ya;LI Hai-Chang;WANG Rui;HU Xiao-Hui(Science and Technology on Integrated Information System Laboratory,Institute of Software,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
机构地区:[1]中国科学院软件研究所天基综合信息系统重点实验室,北京100190 [2]中国科学院大学,北京100049
出 处:《自动化学报》2023年第11期2311-2325,共15页Acta Automatica Sinica
基 金:国家重点研发计划(2019YFB1405100);国家自然科学基金(61802380,61802016)资助。
摘 要:对于部分可观测环境下的多智能体交流协作任务,现有研究大多只利用了当前时刻的网络隐藏层信息,限制了信息的来源.研究如何使用团队奖励训练一组独立的策略以及如何提升独立策略的协同表现,提出多智能体注意力意图交流算法(Multi-agent attentional intention and communication,MAAIC),增加了意图信息模块来扩大交流信息的来源,并且改善了交流模式.将智能体历史上表现最优的网络作为意图网络,且从中提取策略意图信息,按时间顺序保留成一个向量,最后结合注意力机制推断出更为有效的交流信息.在星际争霸环境中,通过实验对比分析,验证了该算法的有效性.For multi-agent communication and cooperation tasks in partially observable environments,most of the existing studies only use the information of the hidden layer of the network at the current time,which limits the source of information.This paper studies how to use team rewards to train a set of independent policies and how to improve the collaborative performance of independent policies.A multi-agent attentional intention communication(MAAIC)algorithm is proposed to improve the communication mode,and an intention information module is added to expand the source of communication information.The network with the best performance in the history of an agent is taken as the intention network,from which the policy intention information is extracted.The historical intention information of the agent that performs best at all times is retained as a vector in chronological order,and combined with the attention mechanism and current observation history information to extract more effective information as input for decision-making.The effectiveness of the algorithm is verified by experimental comparison and analysis on StarCraft multi-agent challenge.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.80.220