倒立摆模糊确定性策略梯度控制方法研究  

Research on fuzzy deterministic policy gradient control method for inverted pendulum system

作  者:李霖翔 刘开南 班晓军[1] 冯志超 LI Linxiang;LIU Kainan;BAN Xiaojun;FENG Zhichao(Center for Control Theory and Guidance Technology,Harbin Institute of Technology,Harbin 150001,China;School of Missile Engineering,Rocket Force University of Engineering,Xi’an 710025,China)

机构地区:[1]哈尔滨工业大学控制理论与制导技术研究中心,哈尔滨150001 [2]火箭军工程大学导弹工程学院,西安710025

出  处:《导航定位与授时》2025年第1期38-49,共12页Navigation Positioning and Timing

基  金:国家自然科学基金青年基金(62203461)。

摘  要:倒立摆系统作为一类典型的非最小相位系统,具有显著的非线性和不稳定性特点,使其控制问题具有一定挑战性。针对传统基于深度强化学习的倒立摆控制方法中存在的神经网络可解释性不足、状态量难以收敛到期望值的问题,提出了一种基于确定性策略梯度的模糊强化学习(FDPG)控制算法。该算法将确定性策略梯度方法与T-S模糊模型相结合,利用T-S模糊模型良好的函数拟合能力,逼近Actor-Critic框架中的Actor结构,进而将控制策略用模糊规则直观地表达出来,使控制器的实际意义更加明确。同时,基于T-S模糊模型良好的可解释性优势,将线性二次型调节器(LQR)推导的最优控制律作为先验知识融入T-S模型中,保证了控制器局部稳定性。最后,通过与传统的深度确定性策略梯度(DDPG)算法以及模糊控制方法进行对比分析,验证了所提算法在倒立摆系统的控制中具有更好的控制效果与泛化能力。As a typical non-minimum phase system,the inverted pendulum system exhibits significant nonlinear and unstable characteristics,making it challenging to control.In response to the problems of insufficient interpretability of neural networks and difficulty in converging state variables to expected values in traditional deep reinforcement learning-based control methods for the inverted pendulum,a fuzzy deterministic policy gradient(FDPG)control algorithm is proposed.This algorithm integrates the deterministic policy gradient method with a Takagi-Sugeno(T-S)fuzzy model,exploiting the excellent function approximation capabilities of the T-S fuzzy model to approximate the Actor structure within the Actor-Critic framework,thereby expressing control strategies intuitively through fuzzy rules and enhancing the practical significance of the controller.In addition,by exploiting the interpretability of the T-S fuzzy model,the optimal control law derived from the linear quadratic regulator(LQR)is incorporated into the T-S model as prior knowledge,which ensures the local stability of the controller.Finally,through comparative analysis with the traditional deep deterministic policy gradient(DDPG)algorithm and the piecewise fuzzy control method,the proposed algorithm is shown to offer superior control performance and generalization ability in controlling the inverted pendulum system.

关 键 词:模糊强化学习 模糊T-S模型 倒立摆控制 确定性策略梯度 DDPG算法 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象