An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics  

在线阅读下载全文

作  者:ZHANG Bao-Qiang WANG Bing-Chang CAO Ying 

机构地区:[1]School of Control Science and Engineering,Shandong University,Jinan 250000,China

出  处:《Journal of Systems Science & Complexity》2024年第5期1907-1922,共16页系统科学与复杂性学报(英文版)

基  金:supported in part by the National Natural Science Foundation of China under Grant Nos.62122043,62192753;in part by Natural Science Foundation of Shandong Province for Distinguished Young Scholars under Grant No.ZR2022JQ31;in part by the Innovative Research Groups of the National Natural Science Foundation of China under Grant No.61821004.

摘  要:In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic network is used to estimate the Q-function,and an actor network is used to estimate the control input.A model-free online Q-learning algorithm is obtained for solving this kind of problems.It is proved that under some mild conditions the system state and weight estimation errors can be uniformly ultimately bounded.A simulation with five players is given to verify the effectiveness of the algorithm.

关 键 词:Actor-critic algorithm model-free adaptive control nonzero-sum stochastic game reinforcement learning 

分 类 号:O225[理学—运筹学与控制论] TP181[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象