检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王鼎 王将宇 乔俊飞 WANG Ding;WANG Jiang-Yu;QIAO Jun-Fei(Faculty of Information Technology,Beijing University of Technology,Beijing 100124;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124;Beijing Institute of Artificial Intelligence,Beijing 100124;Beijing Laboratory of Smart Environmental Protection,Beijing 100124)
机构地区:[1]北京工业大学信息学部,北京100124 [2]计算智能与智能系统北京市重点实验室,北京100124 [3]北京人工智能研究院,北京100124 [4]智慧环保北京实验室,北京100124
出 处:《自动化学报》2024年第5期980-990,共11页Acta Automatica Sinica
基 金:国家自然科学基金(62222301,61890930-5,62021003);科技创新2030——“新一代人工智能”重大项目(2021ZD0112302,2021ZD0112301)资助。
摘 要:自适应评判技术已经广泛应用于求解复杂非线性系统的最优控制问题,但利用其求解离散时间非线性随机系统的无限时域最优控制问题还存在一定局限性.本文融合自适应评判技术,建立一种数据驱动的离散随机系统折扣最优调节方法.首先,针对宽松假设下的非线性随机系统,研究带有折扣因子的无限时域最优控制问题.所提的随机系统Q-learning算法能够将初始的容许策略单调不增地优化至最优策略.基于数据驱动思想,随机系统Q-learning算法在不建立模型的情况下直接利用数据进行策略优化.其次,利用执行−评判神经网络方案,实现了随机系统Q-learning算法.最后,通过两个基准系统,验证本文提出的随机系统Q-learning算法的有效性.Adaptive critic technology has been widely employed to solve the optimal control problems of complicated nonlinear systems,but there are some limitations to solve the infinite-horizon optimal problems of discrete-time nonlinear stochastic systems.In this paper,we establish a data-driven discounted optimal regulation method for discrete-time stochastic systems involving adaptive critic technology.First,we investigate the infinite-horizon optimal problems with the discount factor for stochastic systems under the relaxed assumption.The developed stochastic Qlearning algorithm can optimize an initial admissible policy to the optimal one in a monotonically nonincreasing way.Based on the data-driven idea,the policy optimization of the stochastic Q-learning algorithm is executed without a dynamic model.Then,the stochastic Q-learning algorithm is implemented by utilizing the actor-critic neural networks.Finally,two nonlinear benchmarks are given to demonstrate the overall performance of the developed stochastic Q-learning algorithm.
关 键 词:自适应评判设计 数据驱动 离散系统 神经网络 Q-LEARNING 随机最优控制
分 类 号:TP13[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229