检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高甲博 肖玮 何智杰 GAO Jiabo;XIAO Wei;HE Zhijie(Army Logistics Academy,Military Logistics Department,Chongqing 400000,China;Unit 95019 of the People s Liberation Army,Xiangyang 441100,China;Unit 31680 of the People s Liberation Army,Chongzhou 611230,China)
机构地区:[1]中国人民解放军陆军勤务学院军事物流系,重庆400000 [2]中国人民解放军95019部队,湖北襄阳441100 [3]中国人民解放军31680部队,四川崇州611230
出 处:《指挥控制与仿真》2023年第6期7-18,共12页Command Control & Simulation
基 金:重庆市教委科学技术研究项目基金(KJZD-K202312903);陆军勤务学院研究生科研创新项目基金(LQ-ZD-202209);陆军勤务学院科研项目(LQ-ZD-202316);重庆市研究生科研创新项目(CYS23778)。
摘 要:针对策略未知逃逸无人机环境中多无人机协同追捕对抗任务,提出P3C-MADDPG算法的多无人机协同追捕对抗策略。首先,为解决多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient,MADDPG)算法训练速度慢和Q值高估问题,在MADDPG算法中分别采用基于树形结构储存的优先经验回放机制(Prioritized Experience Replay,PER)和设计的3线程并行Critic网络模型,提出P3C-MADDPG算法。然后基于构建的无人机运动学模型,设计追逃无人机的状态空间、稀疏奖励与引导式奖励相结合的奖励函数、加速度不同的追逃动作空间等训练要素。最后基于上述训练要素,通过P3C-MADDPG算法生成策略未知逃逸无人机环境中多无人机协同追捕对抗策略。仿真实验表明,P3C-MADDPG算法在训练速度上平均提升了11.7%,Q值平均降低6.06%,生成的多无人机协同追捕对抗策略能有效避开障碍物,能实现对策略未知逃逸无人机的智能追捕。Aiming at the cooperative pursuit and confrontation task of multiple UAVs in the unknown escape UAV environment,a multi-UAVs cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm is proposed.First,in order to solve the problem of slow training speed and over estimation of Q value of Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,In MADDPG algorithm,Prioritized Experience Replay(PER)based on tree structure storage and a parallel Critic network model with 3 threads are prioritized respectively,and the P3C-MADDPG algorithm is proposed.Then,based on the kinematics model of UAV,training elements such as state space,reward function combining sparse reward and guided reward,pursuit action space with different accelerations are designed.Finally,based on the above training elements,the P3C-MADDPG algorithm is used to generate the cooperative pursuit and confrontation strategy of multiple UAVs in the unknown escape UAV environment.Simulation experiments show that the P3C-MADDPG algorithm increases the training speed by 11.7%on average,and decreases the Q value by 6.06%on average.The generated multi-UAV cooperative pursuit and confrontation strategy can effectively avoid obstacles,and more intelligently realize the pursuit of unmanned aerial vehicles with unknown strategies.
关 键 词:P3C-MADDPG 协同追捕对抗策略 优先经验回放 Q值 多无人机
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49