强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述  被引量:11

Reinforcement learning and adaptive/approximate dynamic programming:A survey from theory to applications in multi-agent systems

在线阅读下载全文

作  者:温广辉 杨涛 周佳玲 付俊杰 徐磊 WEN Guang-hui;YANG Taoy;ZHOU Jia-ling;FU Jun-jie;XU Lei(Department of Systems Science,Southeast University,Nanjing 211189,China;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China;Advanced Research Institute of Multidisciplinary Sciences,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]东南大学系统科学系,南京211189 [2]东北大学流程工业综合自动化国家重点实验室,沈阳110819 [3]北京理工大学前沿交叉科学研究院,北京100081

出  处:《控制与决策》2023年第5期1200-1230,共31页Control and Decision

基  金:国家自然科学基金项目(U22B2046,62073079,62088101,62133003,61991403,62173085,62003167);装备预研教育部联合基金项目(8091B022114)。

摘  要:近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两类密切相关的算法在不同研究领域的发展历程,着重介绍其从应用于单个智能体(控制对象)序贯决策(最优控制)问题到多智能体系统序贯决策(最优协调控制)问题的发展脉络和研究进展.进一步,在简要介绍自适应动态规划算法的结构变化历程和由基于模型的离线规划到无模型的在线学习发展演进的基础上,综述自适应动态规划算法在多智能体系统最优协调控制问题中的研究进展.最后,给出多智能体强化学习算法和利用自适应动态规划求解多智能体系统最优协调控制问题研究中值得关注的一些挑战性课题.Reinforcement learning(RL)and adaptive/approximate dynamic programming(ADP)algorithms have recently received much attention from various scientific fields(e.g.,artificial intelligence,systems and control,and applied mathematics).This is partly due to their successful applications in a series of challenging problems,such as the sequential decision and optimal coordination control problems of large-scale multi-agent systems.In this paper,some preliminaries on RL and ADP algorithms are firstly introduced,and then the developments of these two closely related algorithms in different research fields are reviewed respectively,with emphasis on the developments from solving the sequential decision(optimal control)problem for single agent(control plant)to the sequential decision(optimal coordination control)problem of multi-agent systems by utilizing these two algorithms.Furthermore,after briefly surveying the structure evolution of the ADP algorithm in the last decades and the recent development of the ADP algorithm from model-based offline programming framework to model-free online learning framework,the research progress of the ADP algorithm in solving the optimal coordination control problem of multi-agent systems is reviewed.Finally,some interesting yet challenging issues on MARL algorithms and using ADP algorithms to solve optimal coordination control problem of multi-agent systems are suggested.

关 键 词:强化学习 自适应动态规划 多智能体系统 马尔科夫决策过程 序贯决策 最优协调控制 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象