融合自适应评判的随机系统数据驱动策略优化

Data-driven Policy Optimization for Stochastic Systems Involving Adaptive Critic

作　　者：王鼎王将宇乔俊飞 WANG Ding;WANG Jiang-Yu;QIAO Jun-Fei(Faculty of Information Technology,Beijing University of Technology,Beijing 100124;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124;Beijing Institute of Artificial Intelligence,Beijing 100124;Beijing Laboratory of Smart Environmental Protection,Beijing 100124)

机构地区：[1]北京工业大学信息学部,北京100124 [2]计算智能与智能系统北京市重点实验室,北京100124 [3]北京人工智能研究院,北京100124 [4]智慧环保北京实验室,北京100124

出　　处：《自动化学报》2024年第5期980-990,共11页Acta Automatica Sinica

基　　金：国家自然科学基金(62222301,61890930-5,62021003);科技创新2030——“新一代人工智能”重大项目(2021ZD0112302,2021ZD0112301)资助。

摘　　要：自适应评判技术已经广泛应用于求解复杂非线性系统的最优控制问题,但利用其求解离散时间非线性随机系统的无限时域最优控制问题还存在一定局限性.本文融合自适应评判技术,建立一种数据驱动的离散随机系统折扣最优调节方法.首先,针对宽松假设下的非线性随机系统,研究带有折扣因子的无限时域最优控制问题.所提的随机系统Q-learning算法能够将初始的容许策略单调不增地优化至最优策略.基于数据驱动思想,随机系统Q-learning算法在不建立模型的情况下直接利用数据进行策略优化.其次,利用执行−评判神经网络方案,实现了随机系统Q-learning算法.最后,通过两个基准系统,验证本文提出的随机系统Q-learning算法的有效性.Adaptive critic technology has been widely employed to solve the optimal control problems of complicated nonlinear systems,but there are some limitations to solve the infinite-horizon optimal problems of discrete-time nonlinear stochastic systems.In this paper,we establish a data-driven discounted optimal regulation method for discrete-time stochastic systems involving adaptive critic technology.First,we investigate the infinite-horizon optimal problems with the discount factor for stochastic systems under the relaxed assumption.The developed stochastic Qlearning algorithm can optimize an initial admissible policy to the optimal one in a monotonically nonincreasing way.Based on the data-driven idea,the policy optimization of the stochastic Q-learning algorithm is executed without a dynamic model.Then,the stochastic Q-learning algorithm is implemented by utilizing the actor-critic neural networks.Finally,two nonlinear benchmarks are given to demonstrate the overall performance of the developed stochastic Q-learning algorithm.

关键词：自适应评判设计数据驱动离散系统神经网络 Q-LEARNING 随机最优控制

分类号：TP13[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合自适应评判的随机系统数据驱动策略优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合自适应评判的随机系统数据驱动策略优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索