不依赖初始容许控制的非对称约束零和博弈智能评判设计  

Intelligent critic design for asymmetric constrained zero-sum games without relying on initial admissible control

在线阅读下载全文

作  者:李梦花 王鼎 赵明明 乔俊飞[1,2,3,4] LI Meng-hua;WANG Ding;ZHAO Ming-ming;QIAO Jun-fei(School of Information Science and Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing University of Technology,Beijing 100124,China;Beijing Laboratory of Smart Environmental Protection,Beijing University of Technology,Beijing 100124,China;Beijing Institute of Artificial Intelligence,Beijing University of Technology,Beijing 100124,China)

机构地区:[1]北京工业大学信息科学技术学院,北京100124 [2]北京工业大学计算智能与智能系统北京市重点实验室,北京100124 [3]北京工业大学智慧环保北京实验室,北京100124 [4]北京工业大学北京人工智能研究院,北京100124

出  处:《控制与决策》2025年第4期1347-1356,共10页Control and Decision

基  金:国家自然科学基金项目(62222301,61890930-5,62021003);新一代人工智能国家科技重大专项(2021ZD0112302,2021ZD0112301);北京市自然科学基金项目(JQ19013)。

摘  要:利用自适应评判控制方法研究具有非对称约束的连续时间零和博弈问题.首先,建立一种新颖的非二次型函数处理非对称约束问题,以降低对控制矩阵的限制.其次,推导最优控制、最坏扰动,以及Hamilton-Jacobi-Isaacs方程.然后,建立一种自适应评判控制方法以近似最优代价函数,从而获得近似最优控制以及近似最坏扰动.针对具有非对称约束的零和博弈问题,提出一种新型评判学习准则来强化学习过程并消除对初始容许控制的依赖.此外,利用Lyapunov方法证明系统状态和评判网络权值近似误差的稳定性.最后,利用F-16战斗机和倒立摆两个实例验证所提算法的有效性.同时,给出传统学习算法下的仿真结果,进一步说明所提新型学习准则的可行性.The continuous-time zero-sum game problem with asymmetric constraints is investigated by making use of the adaptive critic control approach.Firstly,a novel nonquadratic function is established to deal with the asymmetric constraint problem,which relaxes the restriction on the control matrix.Secondly,the optimal control,the worst disturbance,and the Hamilton-Jacobi-Isaacs equation are derived.After that,an adaptive critic control method is constructed to approximate the optimal cost function,so as to obtain the near-optimal control as well as the near-worst disturbance.It is worth mentioning that for the zero-sum game problem with asymmetric constraints,this paper proposes an innovative critic learning criterion to strengthen the learning process and eliminate the dependence on the initial admissible control,which has not been considered in previous papers.Moreover,the stability of the system state and the weight estimation error of the critic network is proved using the Lyapunov method.Finally,the effectiveness of the proposed algorithm is verified by utilizing two examples,namely,the F-16 aircraft and the inverted pendulum.At the same time,for comparison,the simulation results under the traditional learning algorithm are provided to further illustrate the feasibility of the innovative learning criterion proposed.

关 键 词:自适应评判设计 自适应动态规划 连续系统 零和博弈 非对称约束 初始容许控制 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象