检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄垲亮 王莉[1] HUANG Kailiang;WANG Li(School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China)
机构地区:[1]天津理工大学计算科学与工程学院,天津300384
出 处:《天津理工大学学报》2023年第6期26-33,共8页Journal of Tianjin University of Technology
基 金:国家自然科学基金(61403280,61773286)。
摘 要:对于未知动力学的离散多智能体系统最优一致性控制问题,提出一种新颖的策略迭代算法,采用(hamilton-jacobi-bellman,HJB)方程解决多智能体系统的最优一致性控制问题.在实际应用中,由于多智能体系统具有复杂性和未知性,使完整的动力学模型无法获取,用一般的数学方法无法得到HJB方程的解.为克服这些困难,新方法借助强化学习,只需智能体与邻居间的状态信息误差,即可近似得到最优的控制策略,并给出策略迭代算法的收敛性证明,理论上证明了可解决未知多智能体系统的最优一致性控制问题.通过仿真试验证明:文中算法比传统的策略迭代算法更具有高效性.Based on optimal consensus control problems for a discrete-time multi-agent system with unknown dynamics,a novel policy iteration algorithm is proposed.HJB equation is employed to deal with optimal consensus control problems of the discrete-time multi-agent system with unknown dynamics,and the complete dynamic model is hard to be obtained in real applications because of the complexity and unknowability of the multi-agent system.So solutions of the HJB equation are dificult to be gotten by traditional ways.In order to overcome these difficulties,the method with the help of reinforcement learning,the optimal control can be obtained by only the state information error between the intelligent agent and its neighbor.The convergence proof of the policy iterative algorithm is given.It is theoretical proved that the method can solve optimal consensus control problems of unknown multi-agent systems.Simulation experiments are given to prove the effectiveness of the algorithm and more efficient than the traditional policy iteration algorithm.
关 键 词:多智能体系统 最优一致性控制 策略迭代 强化学习
分 类 号:TP273.1[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.80