检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李臻 范家璐 姜艺 柴天佑 LI Zhen;FAN Jia-Lu;JIANG Yi;CHAI Tian-You(State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819)
机构地区:[1]东北大学流程工业综合自动化国家重点实验室,沈阳110819
出 处:《自动化学报》2021年第9期2182-2193,共12页Acta Automatica Sinica
基 金:国家自然科学基金(61533015,61304028);兴辽英才计划(XLYC2007135)资助。
摘 要:针对模型未知的线性离散系统在扰动存在条件下的调节控制问题,提出了一种基于Off-policy的输入输出数据反馈的H∞控制方法.本文从状态反馈在线学习算法出发,针对系统运行过程中状态数据难以测得的问题,通过引入增广数据向量将状态反馈策略迭代在线学习算法转化为输入输出数据反馈在线学习算法.更进一步,通过引入辅助项的方法将输入输出数据反馈策略迭代在线学习算法转化为无模型输入输出数据反馈Off-policy学习算法.该算法利用历史输入输出数据实现最优输出反馈策略的学习,同时克服了On-policy算法需要频繁与实际环境进行交互这一缺点.除此之外,与Onpolicy算法相比,Off-policy学习算法具有克服学习噪声的影响,使学习结果收敛于理论最优值这一优点.最终,通过仿真实验验证了学习算法的收敛性.This paper proposes an H∞control method based on input and output data feedback and off-policy iteration for linear discrete-time systems with unknown model in the presence of disturbances.Aiming at the problem that it is difficult to measure the state data during the operation of the system,this paper considers the knowledge of the state feedback policy iteration(PI)algorithm and transforms the online state feedback learning algorithm into the online input-output data feedback learning algorithm by introducing augmented data vector.Furthermore,the online input-output data feedback learning algorithm is transformed into a model-free input-output data feedback off-policy learning algorithm by introducing auxiliary items.This algorithm uses historical input and output data to realize the optimal output feedback learning strategy,and overcomes the shortcoming of the on-policy algorithm that needs frequent interaction with the environment.In addition,compared with the on-policy algorithm,off-policy has the advantage of overcoming the influence of learning noise and converging to the theoretical optimal value.Finally,the convergence of this algorithm is verified by simulation experiments.
关 键 词:H∞控制 强化学习 Off-policy 数据驱动
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15