基于强化学习的倒立摆分数阶梯度下降RBF控制  被引量:3

Reinforcement learning based fractional gradient descent RBF neural network control of inverted pendulum

在线阅读下载全文

作  者:薛晗[1] 邵哲平[1] 方琼林 刘晓佳[1] XUE Han;SHAO Zhe-ping;FANG Qiong-lin;LIU Xiao-jia(Institute of Navigation,Jimei University,Xiamen 361021,China)

机构地区:[1]集美大学航海学院,福建厦门361021

出  处:《控制与决策》2021年第1期125-134,共10页Control and Decision

基  金:国家自然科学基金项目(51579114);福建省自然科学基金项目(2018J05085).

摘  要:为了提高强化学习的控制性能,提出一种基于分数梯度下降RBF神经网络的强化学习算法.通过评价神经网络和执行神经网络组成强化学习系统,利用神经网络记忆和联想,学会控制倒立摆,提高控制精度,使误差趋于零,直至学习成功,并证明闭环系统的稳定性.通过倒立摆的物理实验发现,当分数阶阶数较大,微分的作用更显著,对角速度和速度的控制效果更好,角速度和速度的均方误差和平均绝对误差较小;当分数阶阶数较小,积分的作用更显著,对倾斜角和位移的控制效果更好,因此倾斜角和位移的均方误差和平均绝对误差较小.仿真实验的结果表明,所提算法动态响应好,超调量小,调整时间短,精度高,泛化性能好.它优于基于RBF神经网络的强化学习算法和传统强化学习算法,能有效地加快梯度下降法的收敛速度,提高其控制性能.在引入适当的干扰后,所提算法能够快速地自我调节并恢复稳定状态,控制器的鲁棒性和动态性能满足实际要求.In order to improve the control performance of reinforcement learning,a reinforcement learning algorithm based on the fractional gradient descent RBF neural network is proposed.Based on the evaluation neural network and action neural network,the reinforcement learning system uses neural network memory and association,and learns to control the inverted pendulum.The control accuracy is improved with the error tending to zero until the learning is successful.The stability of the closed-loop system is proved.The physical experiment of inverted pendulum is carried out.It is pointed that when the fractional order is large,the differential effect is more significant,the control effect of diagonal velocity and velocity is better,and the mean square error and mean absolute error of angular velocity and velocity are smaller.When the fractional order is small,the effect of integral is more significant,and the control effect on tilt angle and displacement is better.The results indicate that the algorithm has good dynamic response,small overshoot,short adjustment time,high precision and good generalization performance.It is superior to the reinforcement learning algorithm based on the RBF neural network and the traditional reinforcement learning algorithm.It can effectively accelerate the convergence speed of the gradient descent method and improve its control performance.After introducing appropriate disturbance,the controller can quickly self-adjust and recover the stable state.The robustness and dynamic performance of the controller meet the actual requirements.

关 键 词:强化学习 径向基神经网络 倒立摆 分数阶 梯度下降 神经网络控制 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象