群视角下的多智能体强化学习方法综述  被引量:2

Survey on multi-agent reinforcement learning methods from the perspective of population

在线阅读下载全文

作  者:项凤涛 罗俊仁 谷学强 苏炯铭 张万鹏 XIANG Fengtao;LUO Junren;GU Xueqiang;SU Jiongming;ZHANG Wanpeng(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]国防科技大学智能科学学院,湖南长沙410073

出  处:《智能科学与技术学报》2023年第3期313-329,共17页Chinese Journal of Intelligent Science and Technology

基  金:国家自然科学基金项目(No.61603403,No.U1734208);湖南省自然科学基金项目(No.2021JJ40693)。

摘  要:多智能体系统是分布式人工智能领域的前沿研究概念,传统的多智能体强化学习方法主要聚焦群体行为涌现、多智能体合作与协调、智能体间交流与通信、对手建模与预测等主题,但依然面临环境部分可观、对手策略非平稳、决策空间维度高、信用分配难理解等难题,如何设计满足智能体数量规模比较大、适应多类不同应用场景的多智能体强化学习方法是该领域的前沿课题。首先简述了多智能体强化学习的相关研究进展;其次着重从规模可扩展与种群自适应两个视角对多种类、多范式的多智能体学习方法进行了综合概述归纳,系统梳理了集合置换不变性、注意力机制、图与网络理论、平均场理论共四大类规模可扩展学习方法,迁移学习、课程学习、元学习、元博弈共四大类种群自适应强化学习方法,给出典型应用场景;最后从基准平台开发、双层优化架构、对抗策略学习、人机协同价值对齐和自适应博弈决策环共5个方面进行了前沿研究方向展望,该研究可为多模态环境下多智能强化学习的相关前沿重点问题研究提供参考。Multi-agent systems are a cutting-edge research concept in the field of distributed artificial intelligence.Traditional multi-agent reinforcement learning methods mainly focus on topics such as group behavior emergence,multi-agent cooperation and coordination,communication and communication between agents,opponent modeling and prediction.However,they still face challenges such as observable environment,non-stationary opponent strategies,high dimensionality of decision space,and difficulty in understanding credit allocation.How to design multi-agent reinforcement learning methods that meet the large number and scale of intelligent agents and adapt to multiple different application scenarios is a cutting-edge topic in this field.This article first outlined the relevant research progress of multi-agent reinforcement learning.Secondly,a comprehensive overview and induction of multi-agent learning methods with multiple types and paradigms were conducted from the perspectives of scalability and population adaptation.Four major categories of scalable learning methods were systematically sorted out,including set permutation invariance,attention,graph and network theory,and mean field theory.There were four major categories of population adaptive reinforcement learning methods:transfer learning,course learning,meta learning,and meta game,and typical application scenarios were provided.Finally,the frontier research directions were prospected from five aspects:benchmark platform development,two-layer optimization architecture,adversarial strategy learning,human-machine collaborative value alignment and adaptive game decisionmaking loop,providing reference for the research on relevant frontier key issues of multi-agent reinforcement learning in multimodal environments.

关 键 词:分布式智能 平均场理论 图神经网络 元学习 元博弈 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象