基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究  

Robust observer-based deep reinforcement learning for attitude stabilization of vertical takeoff and landing vehicle

在线阅读下载全文

作  者:李彦铃 罗飞舟 葛致磊[1] LI Yanling;LUO Feizhou;GE Zhilei(School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,China;China Academy of Launch Vehicle Technology,Beijing 100076,China)

机构地区:[1]西北工业大学航天学院,陕西西安710072 [2]中国运载火箭技术研究院,北京100076

出  处:《系统工程与电子技术》2024年第3期1038-1047,共10页Systems Engineering and Electronics

摘  要:针对考虑弹性振动、模型不确定干扰下的垂直起降运载器姿态稳定问题,将鲁棒观测器和深度强化学习中的近端策略优化算法相结合,研究了一种基于鲁棒观测器的近端策略优化(robust observer-based proximal policy optimization,ROB-PPO)方法。该方法设计鲁棒观测器重构受弹性振动干扰的运载器姿态信息,将鲁棒观测器与运载器动力学模型组成环境,将鲁棒观测器得到的重构姿态作为深度强化学习算法的状态,使得深度强化学习智能体与之不断交互,从而训练智能体控制运载器姿态稳定。仿真结果表明,所研究的ROB-PPO算法相较于目前常用的自适应模糊比例-积分-微分(proportional-integral-derivative,PID)算法鲁棒性更强,收敛速度更快。最后,在自主研制的垂直起降运载器上验证了所提出算法有效性。A robust observer-based proximal policy optimization(ROB-PPO)control method,which combines a robust observer and a proximal policy optimization in the deep reinforcement learning algorithm,is studied for the attitude stabilization problem of vertical takeoff and landing vehicles under the consideration of elastic vibration and model uncertainty disturbance.The method designs the robust observer to reconstruct the carrier attitude information disturbed by elastic vibration,composes the environment of the robust observer and the carrier dynamics model,and takes the reconstructed attitude obtained by the robust observer as the state of the deep reinforcement learning algorithm,so that the deep reinforcement learning intelligent body continuously interacts with it,thus training the intelligent body to control the carrier attitude stabilization.The simulation results show that the studied ROB-PPO algorithm is more robust and converges faster than the adaptive fuzzy proportional-integral-derivative(PID)algorithm commonly used today.Finally,the effectiveness of the proposed algorithm is verified on a self-developed vertical takeoff and landing vehicle.

关 键 词:垂直起降运载器 姿态控制 鲁棒观测器 深度强化学习 

分 类 号:V448.113[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象