检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:舒心
机构地区:[1]上海理工大学理学院,上海
出 处:《理论数学》2023年第3期659-668,共10页Pure Mathematics
摘 要:本文通过矩阵变换将带熵的随机线性二次最优控制问题的解转化为其等价形式后,证明了线性二次方程的二次项系数解的唯一性和迭代式的收敛性,而一次项系数为0,常数项系数只与二次项有关,控制过程的最优概率分布也只与二次项有关。然后用蒙特卡洛随机抽样样本的均值估计期望值,由此设置了算法1,并证明了算法1中的迭代式具有波动性,波动率的大小和随机参数的方差有关,也与蒙特卡洛中的样本数有关,样本数越多,波动对应的方差越小。然后用两个数值案例比较了随机逼近Q-learning算法和蒙特卡洛Q-learning算法,相同迭代次数下,随机逼近Q-learning算法计算时间更少,但误差更大,蒙特卡洛Q-learning算法收敛更快更稳定,并且可以通过增加随机抽取的样本数使误差更小。In this paper, after transforming the solution of the stochastic linear quadratic optimal control problem with entropy into its equivalent form through matrix transformation, we prove the uniqueness of the solution of the quadratic coefficient of the linear quadratic equation and the convergence of the iterative formula, and the result shows that the coefficient of the first term is 0, the coefficient of the constant term is only related to the quadratic term, and the optimal proba-bility distribution of the control process is only related to the quadratic term. Then, the mean value of random sampling samples in Monte Carlo is used to estimate the expected value, thus algorithm 1 is set up, and it is proved that the iterative formula in algorithm 1 has volatility, the volatility is related to the variance of random parameters and the number of samples in Monte Carlo, the more sample number, the smaller the variance of the fluctuation. Then, two numerical cases are used to compare Q-learning algorithm with stochastic approximation and Q-learning algorithm with Monte Carlo. Under the same number of iterations, Q-learning algorithm with stochastic approximation takes less time to compute, but the error is larger. Q-learning algorithm with Monte Carlo converges faster and more stable. Moreover, the error can be reduced by increasing the number of randomly selected samples.
关 键 词:随机线性二次最优控制 收敛性 Q-LEARNING 蒙特卡洛 随机逼近
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.227