P3C-MADDPG算法的多无人机协同追捕对抗策略研究  

Research on multi-UAV cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm

在线阅读下载全文

作  者:高甲博 肖玮 何智杰 GAO Jiabo;XIAO Wei;HE Zhijie(Army Logistics Academy,Military Logistics Department,Chongqing 400000,China;Unit 95019 of the People s Liberation Army,Xiangyang 441100,China;Unit 31680 of the People s Liberation Army,Chongzhou 611230,China)

机构地区:[1]中国人民解放军陆军勤务学院军事物流系,重庆400000 [2]中国人民解放军95019部队,湖北襄阳441100 [3]中国人民解放军31680部队,四川崇州611230

出  处:《指挥控制与仿真》2023年第6期7-18,共12页Command Control & Simulation

基  金:重庆市教委科学技术研究项目基金(KJZD-K202312903);陆军勤务学院研究生科研创新项目基金(LQ-ZD-202209);陆军勤务学院科研项目(LQ-ZD-202316);重庆市研究生科研创新项目(CYS23778)。

摘  要:针对策略未知逃逸无人机环境中多无人机协同追捕对抗任务,提出P3C-MADDPG算法的多无人机协同追捕对抗策略。首先,为解决多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient,MADDPG)算法训练速度慢和Q值高估问题,在MADDPG算法中分别采用基于树形结构储存的优先经验回放机制(Prioritized Experience Replay,PER)和设计的3线程并行Critic网络模型,提出P3C-MADDPG算法。然后基于构建的无人机运动学模型,设计追逃无人机的状态空间、稀疏奖励与引导式奖励相结合的奖励函数、加速度不同的追逃动作空间等训练要素。最后基于上述训练要素,通过P3C-MADDPG算法生成策略未知逃逸无人机环境中多无人机协同追捕对抗策略。仿真实验表明,P3C-MADDPG算法在训练速度上平均提升了11.7%,Q值平均降低6.06%,生成的多无人机协同追捕对抗策略能有效避开障碍物,能实现对策略未知逃逸无人机的智能追捕。Aiming at the cooperative pursuit and confrontation task of multiple UAVs in the unknown escape UAV environment,a multi-UAVs cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm is proposed.First,in order to solve the problem of slow training speed and over estimation of Q value of Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,In MADDPG algorithm,Prioritized Experience Replay(PER)based on tree structure storage and a parallel Critic network model with 3 threads are prioritized respectively,and the P3C-MADDPG algorithm is proposed.Then,based on the kinematics model of UAV,training elements such as state space,reward function combining sparse reward and guided reward,pursuit action space with different accelerations are designed.Finally,based on the above training elements,the P3C-MADDPG algorithm is used to generate the cooperative pursuit and confrontation strategy of multiple UAVs in the unknown escape UAV environment.Simulation experiments show that the P3C-MADDPG algorithm increases the training speed by 11.7%on average,and decreases the Q value by 6.06%on average.The generated multi-UAV cooperative pursuit and confrontation strategy can effectively avoid obstacles,and more intelligently realize the pursuit of unmanned aerial vehicles with unknown strategies.

关 键 词:P3C-MADDPG 协同追捕对抗策略 优先经验回放 Q值 多无人机 

分 类 号:E911[军事]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象