一种深度强化学习与模仿学习结合的突防策略  被引量:4

A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning

在线阅读下载全文

作  者:王晓芳[1] 顾焜仁 WANG Xiaofang;GU Kunren(School of Aerospace Engineering,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]北京理工大学宇航学院,北京100081

出  处:《宇航学报》2023年第6期914-925,共12页Journal of Astronautics

基  金:国家自然科学基金(11502019)。

摘  要:针对战斗机在攻击目标过程中遭遇防御弹拦截,需同时考虑突防和突防后打击的要求,提出一种基于深度强化学习与模仿学习理论的战斗机智能机动突防算法。首先建立了战斗机突防问题的马尔可夫决策模型,考虑战斗机与防御弹的相对距离以及突防后与目标的距离、战斗机相对战斗机-目标视线的速度前置角,设计了综合考虑突防和打击的奖励函数。接着将近端策略优化(PPO)算法与模仿学习理论相结合,构建了由判别网络、演员网络和评论家网络构成的生成对抗模仿学习-近端策略优化(GAIL-PPO)智能突防网络。最后,结合专家策略对智能突防网络进行了训练。仿真结果表明:GAIL-PPO突防策略在前期充分借鉴专家策略的经验,能够快速收敛,在后期又能在复杂环境中充分探索,得到比专家策略更优的性能。Considering the requirements for penetration and strike after penetration when the fighter encounters the interceptor in the process of attacking the target,an intelligent maneuver penetration for fighter algorithm based on deep reinforcement learning and imitation learning theory is proposed.Firstly,the maneuver penetration of fighter is transformed into a Markov decision process,and a reward function is designed that comprehensively takes into account both penetration and attack by considering the distance between the fighter and the defense missile,the distance between the fighter and the target after penetration,and the velocity deflection angle of the fighter relative to fighter-target line of sight.Then combining Proximal Policy Optimization(PPO)algorithm and imitation learning theory,the Generative antagonistic imitation learningproximal policy optimization(GAIL-PPO)intlligent penetration network is constructed,which is composed of Discrimination network,Actor network and Critic network.Finally,the intelligent penetration network is trained with expert strategy.The simulation results show that the GAIL-PPO penetration strategy can quickly converge by learning the experience of expert strategies in the early stage,and can fully explore in the complex environment in the later stage,obtaining better performance than the expert strategies.

关 键 词:战斗机 机动突防 智能突防 深度强化学习 模仿学习 

分 类 号:V249.31[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象