强化学习控制方法及在类火箭飞行器上的应用  被引量:3

Reinforcement Learning Control and Its Application on Rocket-like Vehicle

在线阅读下载全文

作  者:黄旭 柳嘉润[1,2] 贾晨辉 骆无意 巩庆海 冯明涛[1,2] HUANG Xu;LIU Jiarun;JIA Chenhui;LUO Wuyi;GONG Qinghai;FENG Mingtao(Beijing Aerospace Automatic Control Institute,Beijing 100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligence Control,Beijing 100854,China)

机构地区:[1]北京航天自动控制研究所,北京100854 [2]宇航智能控制技术国家级重点实验室,北京100854

出  处:《宇航学报》2023年第5期708-718,共11页Journal of Astronautics

基  金:国家自然科学基金(U21B2028)。

摘  要:针对类火箭飞行器进行了基于深度确定性策略梯度(DDPG)算法的姿态控制研究,完成了算法设计和智能体训练,并进行了仿真与飞行试验。基于飞行器六自由度模型搭建飞行模拟器,针对悬停模式,以多拍姿态角跟踪误差以及姿态角速度作为智能体可观测的状态,控制指令作为智能体动作,设计了含有跟踪误差、控制指令变化量以及一次性奖励的回报函数,在模拟器中训练智能体并完成了从仿真环境到真实系统的迁移。研究中未按传统设计流程对飞行器模型进行通道分解等简化,轻量化神经网络形式的智能体仅通过与模拟器交互的形式学习姿态控制策略,智能体在仿真和飞行试验中都展现出了良好性能。An attitude control method based on deep deterministic policy gradient(DDPG)is proposed for a rocketlike vehicle,algorithm design and agent training are completed,and simulation and flight testings are carried out.The training simulator is built based on the 6-DoF simulation model of the vehicle.For the hovering flight mode,multi-shot attitude angle errors and attitude angular rates are set as the agent environment state,and the control instructions are set as agent action.A reward function containing the information of tracking errors,control commands variation and one-time reward is designed,and the agent is trained in the simulator and transferred to the real system.In this proposed method,the agent in the form of lightweight neural network learns attitude control strategy only by interacting with the simulator,and the agent shows good performance in simulation and flight test.

关 键 词:强化学习(RL) 深度确定性策略梯度算法(DDPG) 姿态控制 飞行试验 

分 类 号:V249.1[航空宇航科学与技术—飞行器设计] V448.2

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象