基于径向基神经网络的多步Sarsa控制算法  被引量:2

Multi-step Sarsa control algorithm based on RBF neural network

在线阅读下载全文

作  者:司彦娜 普杰信 于晓升 司鹏举 孙力帆 SI Yan-na;PU Jie-xin;YU Xiao-sheng;SI Peng-ju;SUN Li-fan(College of Information Science and Engineering,Henan University of Science and Technology,Luoyang 471023,China;Faculty of Robot Science and Engineering,Northeastern University,Shenyang 110169,China)

机构地区:[1]河南科技大学信息工程学院,河南洛阳471023 [2]东北大学机器人科学与工程学院,沈阳110169

出  处:《控制与决策》2023年第4期944-950,共7页Control and Decision

基  金:航空科学基金项目(20185142003);国家国防基础研究计划项目(JCKY2018419C001);河南省高等学校重点科研项目(20A120008);河南省自然科学基金项目(202300410149).

摘  要:针对具有连续状态空间的无模型非线性系统,提出一种基于径向基(radial basis function,RBF)神经网络的多步强化学习控制算法.首先,将神经网络引入强化学习系统,利用RBF神经网络的函数逼近功能近似表示状态-动作值函数,解决连续状态空间表达问题;然后,结合资格迹机制形成多步Sarsa算法,通过记录经历过的状态提高系统的学习效率;最后,采用温度参数衰减的方式改进softmax策略,优化动作的选择概率,达到平衡探索和利用关系的目的.MountainCar任务的仿真实验表明:所提出算法经过少量训练能够有效实现无模型情况下的连续非线性系统控制;与单步算法相比,该算法完成任务所用的平均收敛步数更少,效果更稳定,表明非线性值函数近似与多步算法结合在控制任务中同样可以具有良好的性能.For a model-free nonlinear system with continuous state space,a multi-step reinforcement learning control algorithm based on the RBF neural network is proposed.Firstly,the neural network is introduced to a reinforcement learning system for approximating the state-action value function,which is a common solution to the problem of continuous state space expression in reinforcement learning.Then,combined with the eligibility trace mechanism,multistep algorithm Sarsa(λ)is formed to improve the learning efficiency of the system by recording the experienced states.Finally,the softmax strategy is improved by decayed temperature parameters,so as to optimize the selection probability of actions and balance the relationship between exploration and exploitation.The simulation results of the MountainCar task show that the proposed algorithm can effectively achieve the model-free control task of the continuous nonlinear system through fewer times of training.Compared with the single-step algorithm,the multi-step algorithm takes less average convergent steps to complete the task and perform more stable,which proves that the combination of nonlinear value function approximation and the multi-step algorithm has good performance in the control task.

关 键 词:RBF神经网络 强化学习 Sarsa算法 连续空间 值函数近似 资格迹 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象