基于DDPG的运载器预测校正制导算法  被引量:1

Predictor-corrector reentry guidance algorithm based on DDPG for reusable launch vehicle

在线阅读下载全文

作  者:黄鑫宇 代京[1] 刘刚[1] Huang Xinyu;Dai Jing;Liu Gang(China Academy of Launch Vehicle Technology,Beijing 100076,China)

机构地区:[1]中国运载火箭技术研究院,北京100076

出  处:《空天技术》2024年第5期89-102,共14页Aerospace Technology

摘  要:针对运载器返回过程中,传统数值预测校正制导指令生成时间长,实时性较差的问题,引入神经网络生成制导指令提高实时性。针对有监督机器学习方法需要大量已有样本数据来训练神经网络的缺点,引入深度确定性策略梯度算法(DeepDeterministicPolicyGradient,DDPG)结合拟平衡飞行条件生成的飞行剖面,使智能体在仿真环境中自主探索数据训练神经网络。通过对强化学习奖励函数的设计和对制导周期的调整,解决深度强化学习训练返回制导律过程中因稀疏奖励造成的结果不收敛的问题,使运载器能够完成具有过程约束和终端约束的飞行任务。对比仿真表明,强化学习生成的制导律在精度上略优于传统预测校正制导律,而制导指令计算时间相对于传统预测校正制导律大幅度缩短,实时性得到有效提升。蒙特卡洛仿真表明,强化学习生成的制导律在多种扰动的情况下,仍能保证良好的精度和鲁棒性,展现出一定的工程应用前景。A new predictor-corrector reentry guidance base on neural network is introduced to improve real-time performance of traditional predictor-corrector reentry guidance.In view of the disadvantage that supervised machine learning method requires a large number of existing sample data to train neural network,deep deterministic policy gradient(DDPG)algorithm combined with qusi-equilibrium gliding condition method is introduced to make the agent explore independently in the simulation environment,neural network is trained without existing sample data through this method.The sparse reward problem in the process of training is solved by reward shaping and guidance cycle adjustment.Enable reusable launch vehicle to complete flight tasks with process and terminal constraints.The simulation results show that the accuracy of the guidance law generated by reinforcement learning is slightly better than that of the traditional prediction-correction guidance law.While the calculation time of guidance instructions is greatly reduced compared with the traditional prediction-correction guidance law,and the real-time performance is effectively improved.The results of Monte Carlo simulation with random disturbance demonstrate the accuracy and robustness of the proposed algorithm.The proposed algorithm shows a certain prospect of engineering application.

关 键 词:运载器 深度强化学习 预测校正制导 深度确定性策略梯度算法 高度-速度剖面 

分 类 号:V448.235[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象