检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李金娜 尹子轩 LI Jin-na;YIN Zi-xuan(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;School of Information and Control Engineering,Liaoning Shihua University,Fushun 113001,China;State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China)
机构地区:[1]沈阳化工大学信息工程学院,沈阳110142 [2]辽宁石油化工大学信息与控制工程学院,辽宁抚顺113001 [3]东北大学流程工业综合自动化国家重点实验室,沈阳110004
出 处:《控制与决策》2019年第11期2343-2349,共7页Control and Decision
基 金:国家自然科学基金项目(61673280,61525302,61590922,61503257);辽宁省高等学校创新人才项目(LR2017006);辽宁省自然基金计划重点领域联合开放基金项目(2019-KF-03-06);辽宁石油化工大学基金项目(2018XJJ-005)
摘 要:针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.This paper develops a novel off-policy Q-learning method for solving linear quadratic tracking(LQT)problem in discrete-time networked control systems with packet dropout.The proposed method can be implemented using measured data without requiring systems dynamics to be known a priori,and it also allows bounded packet loss.First,networked control systems with packet dropout are established,thus an optimal tracking problem of linear discrete-time networked control systems is further formulated.Then,a Smith predictor is designed to predict current state based on historical data measured on the communication network.On this basis,an optimal tracking problem with packet dropout compensation is put up.Finally,a novel off-policy Q-learning algorithm is developed by integrating dynamic programming with reinforcement learning.The merit of the proposed algorithm is that the optimal tracking control law based predicted states of systems can be learned using only measured data without the need of knowing system dynamics.Moreover,the unbiasedness of solution to Q-function based Bellman equation can be guaranteed by using off-policy Q-learning approach.The simulation results show that the proposed method has good tracking performance for the network control system with unknown dynamic state and packet dropout.
关 键 词:网络控制 非策略Q-学习 线性二次跟踪(LQT) 数据包丢失
分 类 号:TP13[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222