基于分层框架混合强化学习的导弹制导与突防策略

A Missile Guidance and Penetration Strategy Based on Hierarchical Framework Hybrid Reinforcement Learning

作　　者：谭明虎何昊麟艾文洁柴斌 TAN Minghu;HE Haolin;AI Wenjie;CHAI Bin(School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,China)

机构地区：[1]西北工业大学航天学院,西安710072

出　　处：《宇航学报》2025年第1期117-128,共12页Journal of Astronautics

基　　金：航空科学基金(202400010530002)。

摘　　要：针对目标-导弹-防御者三方交战场景中攻击导弹面临主动防御拦截的问题,提出了一种基于分层框架混合强化学习的全过程智能制导与突防策略。首先,分析攻击导弹的制导与突防任务需求,构建了三方交战的运动学模型。其次,基于双层策略结构提出了混合强化学习方法,以分别应对连续和离散两种动作空间类型。通过近端策略优化(PPO)算法训练下层制导与突防模型,获得了自动驾驶仪的制导指令;同时采用深度Q网络(DQN)算法训练上层决策模型,在每个决策时刻根据全局状态选择调用下层子模型。提出的制导与突防策略通过分层框架实现了导弹打击任务中的全过程实时智能决策。与传统综合制导律的对比实验结果表明,基于分层框架混合强化学习的突防制导策略不仅确保了攻击导弹在三方交战环境中的生存能力,同时在能量消耗方面取得了显著优势。An entire process intelligent guidance and penetration strategy,based on a hierarchical framework hybrid reinforcement learning approach,is proposed.This strategy is designed to address the challenge faced by attacking missiles when confronting active defensive interception in a target-missile-defender three-body engagement scenario.Firstly,the requirements for guidance and penetration missions of attacking missiles are analyzed,and a kinematic model for the three-body engagement is established.Secondly,a hybrid reinforcement learning method is introduced,which utilizes a two-layered strategy structure to handle both continuous and discrete action spaces.The low-level guidance and penetration model is trained using the proximal policy optimization(PPO)algorithm to generate control commands for the autopilot.Meanwhile,the high-level decision model is trained with the deep Q-network(DQN)algorithm to select appropriate low-level sub-models at each decision point based on the global state.The proposed guidance and penetration strategy enables real-time intelligent decision-making throughout the entire missile strike mission through the hierarchical framework.Comparative experimental results with traditional synthesis guidance laws demonstrate that the hierarchical hybrid framework reinforcement learning-based guidance and penetration strategy ensures the survivability of the attacking missile in three-body engagement scenarios.Furthermore,it achieves notable advantages in terms of energy consumption.

关键词：强化学习制导突防策略近端策略优化(PPO) 深度Q网络(DQN)

分类号：V448[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于分层框架混合强化学习的导弹制导与突防策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于分层框架混合强化学习的导弹制导与突防策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索