机构地区:[1]南京大学计算机软件新技术国家重点实验室,南京210023 [2]南京大学计算机科学技术与软件工程实验教学中心,南京210023
出 处:《计算机学报》2023年第9期1820-1837,共18页Chinese Journal of Computers
基 金:2018年度科技创新2030—“新一代人工智能”重大项目(批准号:2018AAA0102302)资助。
摘 要:近年来,强化学习技术在连续决策问题上展现出了强大的能力,成为机器学习领域的一个重要分支.通过强化学习技术在多智能体系统下的发展和研究,多智能体强化学习技术有望成为群体智能行为涌现的关键技术手段,但在现阶段仍有诸多科学问题亟待解决.在多智能体强化学习领域,如何提高智能体在协作场景下的合作能力一直是一个热门研究话题.通信被认为是实现多智能体高水平协作的重要元素,因此有不少研究尝试从通信的角度入手,让智能体通过交流来实现更好的协作.现有的大部分与通信有关的多智能体强化学习领域的工作关注于部分可观测问题,在这些工作中智能体通过通信信道共享了部分局部观测.最新的一些研究开始关注如何让智能体通过共享意图来实现更好的协作.然而,在不加限制的意图共享框架下,若智能体的最终行为与原先的意图不符,则可能会对其它智能体产生误导,此时引入通信反而产生了负作用.因此需要一个新的多智能体意图共享框架,在有效利用意图信息的同时避免出现智能体间的意图误导.针对上述问题,本文基于交流意图的思想,提出了一个新的多智能体强化学习意图通信框架2SIS.在2SIS框架下,智能体在决策前需要进行两次通信,第一次通信传播意图信息,第二次通信传播意图依赖关系.两次通信结束后每个智能体各自建立起意图依赖关系图,为了避免出现意图误导,对于意图依赖关系图上被依赖的智能体,2SIS禁止其基于其它智能体的意图进行重新决策,其最终决策即为其初始意图,仅有不被依赖的智能体被允许基于意图信息重新决策.2SIS可以与任意基于值函数的强化学习算法结合实现训练.在2SIS框架下训练的智能体能够学会如何正确地建立意图依赖关系从而实现单向的意图传播,并且不存在意图误导问题.我们选用较具代表�In recent years,reinforcement learning has demonstrated its power in continuous decision-making problems and has become an important branch of machine learning study.As the development of reinforcement learning in multi-agent systems,multi-agent reinforcement learning is expected to become a key technology for the emergence of swarm intelligent behavior,but there are still many scientific problems to be solved at the present stage.Cooperation problem is a popular research topic in the field of multi-agent reinforcement learning.Communication is considered a key element to achieve high-level cooperation among multi-agents.Therefore,some existing approaches try to combine communication with multiagent reinforcement learning,in order to achieve better cooperation among agents.Most of these approaches focus on partial observation problems.In these approaches,agents share their local observations with others through communication channels.In recent work,researchers attempt to let agents share intention to enhance cooperation among agents.However,under unrestricted intention sharing,if the final action of an agent is different with its original intention,it may mislead other agents,which make intention sharing harmful to train.Therefore,a new multi-agent intention sharing scheme is needed to avoid misleading intentions between agents while effectively utilizing intention information.To solve this problem,this paper proposes a multi-agent reinforcement learning intention sharing scheme—2SIS,based on the idea of intention sharing.Under the 2SIS scheme,an agent needs to communicate twice before making a decision.The first communication broadcast intention information,and the second communication broadcast intention dependency relationship.After the two communications,each agent establishes the intention dependency graph separately.In order to avoid intention misleading,2SIS prohibits the agent that is dependent on other agents on the intention dependency graph from re-decision,and its final decision is exactly the same
关 键 词:多智能体系统 深度强化学习 深度多智能体强化学习 通信 意图共享 协作
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...