基于强化学习算法的飞行器轨迹防护及干扰策略生成方法

A Method for Generating Trajectory Protection and Jamming Strategy for Aircraft Based on Reinforcement Learning Algorithm

作　　者：周彬[1] 尚熙刘枫[1] 苏中华 ZHOU Bin;SHANG Xi;LIU Feng;SU Zhonghua(Southwest China Research Institute of Electronic Equipment,Chengdu 610036,China)

机构地区：[1]中国电子科技集团公司第二十九研究所,成都610036

出　　处：《电子信息对抗技术》2025年第2期15-23,共9页Electronic Information Warfare Technology

摘　　要：针对飞行器携带干扰模块在复杂电磁环境中合理利用航迹规避使自身突防能力最大化的问题,提出了一种基于强化学习算法的飞行器轨迹防护及干扰策略生成方法。电磁对抗背景选取多部S、C波段雷达,计算回波信号经过抗干扰模块处理后信噪比,并嵌套SwerlingII及概率准则模型研究一定虚警下飞行器轨迹防护及干扰策略分配问题。选取基于马尔科夫链的Sarsa、深度Q网络(Deep Q-Network,DQN)、Dueling-DQN算法,引入航迹评价与干扰效果评价组成的目标函数进行优化。通过比较其航迹规划效果、雷达进入跟踪模式概率、飞行器动作分配结果,证实了采用强化学习算法的飞行器可以在与环境的认知过程中,通过自身航迹规划及干扰策略生成避免雷达进入制导模式,与固定航迹下干扰资源分配相比有效提升了飞行器的自身防护能力。最后,比较雷达进入跟踪模式概率,可以发现DQN算法要优于其他两种算法。A reinforcement learning algorithm based method for aircraft trajectory protection and jamming strategy generation is proposed to address the problem of maximizing the penetration ability of aircraft carrying jamming modules by utilizing trajectory avoidance in complex electromagnetic environments.Multiple S-band and C-band radars are selected for electromagnetic countermeasures,and the signal-to-noise ratio of the echo signal is calculated after being processed by the anti-jamming module.SwerlingII and probability criterion models are nested to study the trajectory protection and interference strategy allocation of aircraft under certain false alarms.Sarsa,deep Q-network(DQN),and Dueling DQN algorithms are selected based on Markov chains,and an objective function is introduced composed of trajectory evaluation and jamming effect evaluation for optimization.By comparing the trajectory planning effect,radar entering tracking mode probability,and aircraft action allocation results,it has been confirmed that the aircraft using reinforcement learning algorithm can avoid radar entering guidance mode through its own trajectory planning and jamming strategy generation during the cognitive process with the environment.Compared with the allocation of jamming resources under fixed trajectory,it effectively improves the aircraft s self-protection ability.Finally,comparing the probability of radar entering tracking mode,it can be found that the DQN algorithm is superior to the other two algorithms.

关键词：强化学习多功能雷达飞行器轨迹防护干扰策略马尔科夫链

分类号：TN974[电子电信—信号与信息处理]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习算法的飞行器轨迹防护及干扰策略生成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习算法的飞行器轨迹防护及干扰策略生成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索