对带熵的随机线性二次最优控制问题的收敛性证明  

The Proof of the Convergence of Stochastic Linear Quadratic Optimal Control Problem with Entropy

在线阅读下载全文

作  者:舒心 

机构地区:[1]上海理工大学理学院,上海

出  处:《理论数学》2023年第3期659-668,共10页Pure Mathematics

摘  要:本文通过矩阵变换将带熵的随机线性二次最优控制问题的解转化为其等价形式后,证明了线性二次方程的二次项系数解的唯一性和迭代式的收敛性,而一次项系数为0,常数项系数只与二次项有关,控制过程的最优概率分布也只与二次项有关。然后用蒙特卡洛随机抽样样本的均值估计期望值,由此设置了算法1,并证明了算法1中的迭代式具有波动性,波动率的大小和随机参数的方差有关,也与蒙特卡洛中的样本数有关,样本数越多,波动对应的方差越小。然后用两个数值案例比较了随机逼近Q-learning算法和蒙特卡洛Q-learning算法,相同迭代次数下,随机逼近Q-learning算法计算时间更少,但误差更大,蒙特卡洛Q-learning算法收敛更快更稳定,并且可以通过增加随机抽取的样本数使误差更小。In this paper, after transforming the solution of the stochastic linear quadratic optimal control problem with entropy into its equivalent form through matrix transformation, we prove the uniqueness of the solution of the quadratic coefficient of the linear quadratic equation and the convergence of the iterative formula, and the result shows that the coefficient of the first term is 0, the coefficient of the constant term is only related to the quadratic term, and the optimal proba-bility distribution of the control process is only related to the quadratic term. Then, the mean value of random sampling samples in Monte Carlo is used to estimate the expected value, thus algorithm 1 is set up, and it is proved that the iterative formula in algorithm 1 has volatility, the volatility is related to the variance of random parameters and the number of samples in Monte Carlo, the more sample number, the smaller the variance of the fluctuation. Then, two numerical cases are used to compare Q-learning algorithm with stochastic approximation and Q-learning algorithm with Monte Carlo. Under the same number of iterations, Q-learning algorithm with stochastic approximation takes less time to compute, but the error is larger. Q-learning algorithm with Monte Carlo converges faster and more stable. Moreover, the error can be reduced by increasing the number of randomly selected samples.

关 键 词:随机线性二次最优控制 收敛性 Q-LEARNING 蒙特卡洛 随机逼近 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象