一类非线性动态系统基于强化学习的最优控制被引量：9

Optimal control of a class of nonlinear dynamic systems based on reinforcement learning

机构地区：[1]广东工业大学应用数学学院,广州510006 [2]广东工业大学计算机学院,广州510006

出　　处：《控制与决策》2013年第12期1889-1893,共5页Control and Decision

基　　金：国家自然科学基金项目(60974019;61273118);广东省高等学校高层次人才项目;广东省自然科学基金项目(S2012010010570)

摘　　要：提出一类非线性不确定动态系统基于强化学习的最优控制方法.该方法利用欧拉强化学习算法估计对象的未知非线性函数,给出了强化学习中回报函数和策略函数迭代的在线学习规则.通过采用向前欧拉差分迭代公式对学习过程中的时序误差进行离散化,实现了对值函数的估计和控制策略的改进.基于值函数的梯度值和时序误差指标值,给出了该算法的步骤和误差估计定理.小车爬山问题的仿真结果表明了所提出方法的有效性.An optimal control based on Euler reinforcement learning（ERL） is proposed for a class of nonlinear uncertain dynamic systems. In this method, the reinforcement learning algorithm is employed to approximate unknown nonlinear functions in the plant, and the online learning rule for the reward function and the policy function is derived. The value function is estimated and the control policy is improved by using the way of implementing the temporal difference（TD） errors which are discretized by using the forward Euler approximation of time derivative. Based on the value-gradient and TD error performance index, the steps of the algorithm and error estimation theorem are given. Simulation results for the mountain-car problem show the effectiveness of the presented method.

关键词：非线性动态系统强化学习最优控制值函数策略函数

分类号：TP273[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一类非线性动态系统基于强化学习的最优控制被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一类非线性动态系统基于强化学习的最优控制 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一类非线性动态系统基于强化学习的最优控制被引量：9