检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:温广辉 杨涛 周佳玲 付俊杰 徐磊 WEN Guang-hui;YANG Taoy;ZHOU Jia-ling;FU Jun-jie;XU Lei(Department of Systems Science,Southeast University,Nanjing 211189,China;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China;Advanced Research Institute of Multidisciplinary Sciences,Beijing Institute of Technology,Beijing 100081,China)
机构地区:[1]东南大学系统科学系,南京211189 [2]东北大学流程工业综合自动化国家重点实验室,沈阳110819 [3]北京理工大学前沿交叉科学研究院,北京100081
出 处:《控制与决策》2023年第5期1200-1230,共31页Control and Decision
基 金:国家自然科学基金项目(U22B2046,62073079,62088101,62133003,61991403,62173085,62003167);装备预研教育部联合基金项目(8091B022114)。
摘 要:近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两类密切相关的算法在不同研究领域的发展历程,着重介绍其从应用于单个智能体(控制对象)序贯决策(最优控制)问题到多智能体系统序贯决策(最优协调控制)问题的发展脉络和研究进展.进一步,在简要介绍自适应动态规划算法的结构变化历程和由基于模型的离线规划到无模型的在线学习发展演进的基础上,综述自适应动态规划算法在多智能体系统最优协调控制问题中的研究进展.最后,给出多智能体强化学习算法和利用自适应动态规划求解多智能体系统最优协调控制问题研究中值得关注的一些挑战性课题.Reinforcement learning(RL)and adaptive/approximate dynamic programming(ADP)algorithms have recently received much attention from various scientific fields(e.g.,artificial intelligence,systems and control,and applied mathematics).This is partly due to their successful applications in a series of challenging problems,such as the sequential decision and optimal coordination control problems of large-scale multi-agent systems.In this paper,some preliminaries on RL and ADP algorithms are firstly introduced,and then the developments of these two closely related algorithms in different research fields are reviewed respectively,with emphasis on the developments from solving the sequential decision(optimal control)problem for single agent(control plant)to the sequential decision(optimal coordination control)problem of multi-agent systems by utilizing these two algorithms.Furthermore,after briefly surveying the structure evolution of the ADP algorithm in the last decades and the recent development of the ADP algorithm from model-based offline programming framework to model-free online learning framework,the research progress of the ADP algorithm in solving the optimal coordination control problem of multi-agent systems is reviewed.Finally,some interesting yet challenging issues on MARL algorithms and using ADP algorithms to solve optimal coordination control problem of multi-agent systems are suggested.
关 键 词:强化学习 自适应动态规划 多智能体系统 马尔科夫决策过程 序贯决策 最优协调控制
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49