P3C-MADDPG算法的多无人机协同追捕对抗策略研究

Research on multi-UAV cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm

作　　者：高甲博肖玮何智杰 GAO Jiabo;XIAO Wei;HE Zhijie(Army Logistics Academy,Military Logistics Department,Chongqing 400000,China;Unit 95019 of the People s Liberation Army,Xiangyang 441100,China;Unit 31680 of the People s Liberation Army,Chongzhou 611230,China)

机构地区：[1]中国人民解放军陆军勤务学院军事物流系,重庆400000 [2]中国人民解放军95019部队,湖北襄阳441100 [3]中国人民解放军31680部队,四川崇州611230

出　　处：《指挥控制与仿真》2023年第6期7-18,共12页Command Control & Simulation

基　　金：重庆市教委科学技术研究项目基金(KJZD-K202312903);陆军勤务学院研究生科研创新项目基金(LQ-ZD-202209);陆军勤务学院科研项目(LQ-ZD-202316);重庆市研究生科研创新项目(CYS23778)。

摘　　要：针对策略未知逃逸无人机环境中多无人机协同追捕对抗任务,提出P3C-MADDPG算法的多无人机协同追捕对抗策略。首先,为解决多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient,MADDPG)算法训练速度慢和Q值高估问题,在MADDPG算法中分别采用基于树形结构储存的优先经验回放机制(Prioritized Experience Replay,PER)和设计的3线程并行Critic网络模型,提出P3C-MADDPG算法。然后基于构建的无人机运动学模型,设计追逃无人机的状态空间、稀疏奖励与引导式奖励相结合的奖励函数、加速度不同的追逃动作空间等训练要素。最后基于上述训练要素,通过P3C-MADDPG算法生成策略未知逃逸无人机环境中多无人机协同追捕对抗策略。仿真实验表明,P3C-MADDPG算法在训练速度上平均提升了11.7%,Q值平均降低6.06%,生成的多无人机协同追捕对抗策略能有效避开障碍物,能实现对策略未知逃逸无人机的智能追捕。Aiming at the cooperative pursuit and confrontation task of multiple UAVs in the unknown escape UAV environment,a multi-UAVs cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm is proposed.First,in order to solve the problem of slow training speed and over estimation of Q value of Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,In MADDPG algorithm,Prioritized Experience Replay(PER)based on tree structure storage and a parallel Critic network model with 3 threads are prioritized respectively,and the P3C-MADDPG algorithm is proposed.Then,based on the kinematics model of UAV,training elements such as state space,reward function combining sparse reward and guided reward,pursuit action space with different accelerations are designed.Finally,based on the above training elements,the P3C-MADDPG algorithm is used to generate the cooperative pursuit and confrontation strategy of multiple UAVs in the unknown escape UAV environment.Simulation experiments show that the P3C-MADDPG algorithm increases the training speed by 11.7%on average,and decreases the Q value by 6.06%on average.The generated multi-UAV cooperative pursuit and confrontation strategy can effectively avoid obstacles,and more intelligently realize the pursuit of unmanned aerial vehicles with unknown strategies.

关键词：P3C-MADDPG 协同追捕对抗策略优先经验回放 Q值多无人机

分类号：E911[军事]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

P3C-MADDPG算法的多无人机协同追捕对抗策略研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

P3C-MADDPG算法的多无人机协同追捕对抗策略研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索