基于非策略Q-学习的网络控制系统最优跟踪控制被引量：3

Off-policy Q-learning: Optimal tracking control for networked control systems

作　　者：李金娜尹子轩 LI Jin-na;YIN Zi-xuan(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;School of Information and Control Engineering,Liaoning Shihua University,Fushun 113001,China;State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China)

机构地区：[1]沈阳化工大学信息工程学院,沈阳110142 [2]辽宁石油化工大学信息与控制工程学院,辽宁抚顺113001 [3]东北大学流程工业综合自动化国家重点实验室,沈阳110004

出　　处：《控制与决策》2019年第11期2343-2349,共7页Control and Decision

基　　金：国家自然科学基金项目(61673280,61525302,61590922,61503257);辽宁省高等学校创新人才项目(LR2017006);辽宁省自然基金计划重点领域联合开放基金项目(2019-KF-03-06);辽宁石油化工大学基金项目(2018XJJ-005)

摘　　要：针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.This paper develops a novel off-policy Q-learning method for solving linear quadratic tracking(LQT)problem in discrete-time networked control systems with packet dropout.The proposed method can be implemented using measured data without requiring systems dynamics to be known a priori,and it also allows bounded packet loss.First,networked control systems with packet dropout are established,thus an optimal tracking problem of linear discrete-time networked control systems is further formulated.Then,a Smith predictor is designed to predict current state based on historical data measured on the communication network.On this basis,an optimal tracking problem with packet dropout compensation is put up.Finally,a novel off-policy Q-learning algorithm is developed by integrating dynamic programming with reinforcement learning.The merit of the proposed algorithm is that the optimal tracking control law based predicted states of systems can be learned using only measured data without the need of knowing system dynamics.Moreover,the unbiasedness of solution to Q-function based Bellman equation can be guaranteed by using off-policy Q-learning approach.The simulation results show that the proposed method has good tracking performance for the network control system with unknown dynamic state and packet dropout.

关键词：网络控制非策略Q-学习线性二次跟踪(LQT) 数据包丢失

分类号：TP13[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于非策略Q-学习的网络控制系统最优跟踪控制被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于非策略Q-学习的网络控制系统最优跟踪控制 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于非策略Q-学习的网络控制系统最优跟踪控制被引量：3