带攻击角度约束的深度强化元学习制导律被引量：16

Deep Reinforcement Meta-learning Guidance with Impact Angle Constraint

作　　者：梁晨王卫红[1] 赖超 LIANG Chen;WANG Wei-hong;LAI Chao(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;Navigation and Control Research Institute,China North Industries Group Corporation,Beijing 100089,China)

机构地区：[1]北京航空航天大学自动化科学与电气工程学院,北京100191 [2]中国兵器工业导航与控制技术研究所,北京100089

出　　处：《宇航学报》2021年第5期611-620,共10页Journal of Astronautics

基　　金：国防基础科研计划资助(JCKY2018601B101)。

摘　　要：针对执行机构部分失效的速度时变导弹机动目标拦截问题,本文提出一种基于深度强化元学习和剩余飞行时间感知逻辑函数的攻击角度约束三维制导律。首先,采用基于模型的深度强化元学习方法,建立深度神经网络动力学模型;引入模型预测路径积分控制,该深度神经网络动力学模型作为预测模型;采用元学习方法,在线学习执行机构部分失效及目标机动等环境变化。其次,提出基于偏态分布的采样方法,提升模型预测路径积分控制的采样效率。再次,在制导律的指标函数设计中提出一种逻辑函数,降低了制导初始阶段的加速度,提升了末速度。最后,多种情况下的仿真结果及蒙特卡洛仿真校验了方法在提升采样效率与降低初始阶段加速度的有效性。In this paper,a new impact angle constrained guidance law with deep reinforcement learning and time-to-go aware logistic sigmoid function is proposed for varying velocity missile with partial actuator failure against a maneuvering target in the atmosphere.With model-based deep reinforcement learning,a deep neural network is trained as a deep neural dynamics model to be used in model predictive path integral control.Partial actuator and target maneuver will make significant change to environment during guidance,thus the deep neural dynamics is trained to adapt to these changes online via meta-learning to tackle this problem.The deep neural dynamics is then utilized through model predictive path integral control to achieve the guidance design.To benefit the sampling efficiency in model predictive path integral control,a novel sampling method using skew normal distribution is proposed in this work.Furthermore,a time-to-go aware logistic function is designed in the performance index to enhance guidance performance through reduced initial acceleration command and increased terminal velocity.Numerical simulations under various condition and Monte Carlo simulation demonstrate the effectiveness of the proposed guidance law.

关键词：导弹攻击角度约束深度强化元学习容错控制制导

分类号：TJ765.3[兵器科学与技术—武器系统与运用工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

带攻击角度约束的深度强化元学习制导律被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

带攻击角度约束的深度强化元学习制导律 被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

带攻击角度约束的深度强化元学习制导律被引量：16