基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法  

Dynamic Decision-making Method of Unmanned Platform Chaff Jamming for Terminal Defense Based on Multi-agent Deep Reinforcement Learning

在线阅读下载全文

作  者:李传浩 明振军 王国新[1,2] 阎艳[1,2] 丁伟 万斯来 丁涛 LI Chuanhao;MING Zhenjun;WANG Guoxin;YAN Yan;DING Wei;WAN Silai;DING Tao(School of Mechanical Engineering,Beijing Institute of Technology,Beijing 100081,China;Yangtze Delta Region Academy of Beijing Institute of Technology(Jiaxing),Jiaxing 314019,Zhejiang,China;Southwest China Research Institute of Electronic Equipment,Chengdu 610036,Sichuan,China)

机构地区:[1]北京理工大学机械与车辆学院,北京100081 [2]北京理工大学长三角研究院(嘉兴),浙江嘉兴314019 [3]西南电子设备研究所,四川成都610036

出  处:《兵工学报》2025年第3期19-33,共15页Acta Armamentarii

基  金:国家自然科学基金项目(62373047)。

摘  要:无人平台箔条质心干扰是导弹末端防御的重要手段,其在平台机动和箔条发射等方面的智能决策能力是决定战略资产能否保护成功的重要因素。针对目前基于机理模型的计算分析和基于启发式算法的空间探索等决策方法存在的智能化程度低、适应能力差和决策速度慢等问题,提出基于多智能体深度强化学习的箔条干扰末端防御动态决策方法:对多平台协同进行箔条干扰末端防御的问题进行定义并构建仿真环境,建立导弹制导与引信模型、无人干扰平台机动模型、箔条扩散模型和质心干扰模型;将质心干扰决策问题转化为马尔科夫决策问题,构建决策智能体,定义状态、动作空间并设置奖励函数;通过多智能体近端策略优化算法对决策智能体进行训练。仿真结果显示,使用训练后的智能体进行决策,相比多智能体深度确定性策略梯度算法,训练时间减少了85.5%,资产保护成功率提升了3.84倍,相比遗传算法,决策时长减少了99.96%,资产保护成功率增加了1.12倍。Chaff centroid jamming of unmanned platform is an important means of missile terminal defense.The intelligent decision-making ability in platform maneuvering and chaff launching is an important factor to determine whether the strategic assets can be protected successfully.The current decision-making methods,such as computational analysis based on mechanism model and space exploration based on heuristic algorithm,have the problems of low degree of intelligence,poor adaptability and slow decision-making speed.A dynamic decision-making method of chaff jamming for terminal defense based on multi-agent deep reinforcement learning is proposed.The problem of cooperative chaff jamming of multi-platform for terminal defense is defined,and a simulation environment is constructed.The missile guidance and fuze model,unmanned jamming platform maneuvering model,chaff diffusion model and centroid jamming model are established.The centroid jamming decision problem is transformed into a Markov decision problem,a decision-making agent is constructed,the state and action spaces are defined,and a reward function is set.The decision-making agent is trained by using the multi-agent proximal policy optimization(MAPPO)algorithm.The simulated results show that the proposed method reduces the training time by 85.5%and increases the success rate of asset protection by 3.84 compared with the multi-agent deep deterministic policy gradient(MADDPG)algorithm.Compared with the GA,it reduces the deciding time by 99.96%and increases the success rate of asset protection 1.12.

关 键 词:无人平台 质心干扰 箔条干扰 末端防御 多智能体强化学习 电子对抗 

分 类 号:TN972[电子电信—信号与信息处理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象