引入反事实基线的无人机集群对抗博弈方法

UAV swarm adversarial game method with a counterfactual baseline

作　　者：王尔申陈纪浩宏晨刘帆陈艾东景竑元 Ershen WANG;Jihao CHEN;Chen HONG;Fan LIU;Aidong CHEN;Hongyuan JING(School of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang 110136,China;School of Civil and Aviation,Shenyang Aerospace University,Shenyang 110136,China;Multi-Agent Systems Research Centre,Beijing Union University,Beijing 100101,China;College of Robotics,Beijing Union University,Beijing 100101,China)

机构地区：[1]沈阳航空航天大学电子信息工程学院,沈阳110136 [2]沈阳航空航天大学民用航空学院,沈阳110136 [3]北京联合大学多智能体系统研究中心,北京100101 [4]北京联合大学机器人学院,北京100101

出　　处：《中国科学：信息科学》2024年第7期1775-1792,共18页Scientia Sinica(Informationis)

基　　金：国家重点研发计划(批准号:2018AAA0100804);国家自然科学基金(批准号:62173237);北京联合大学科研(批准号:ZK30202304,SK160202103,ZK50201911,ZK30202107);卫星导航系统与装备技术国家重点实验室开放基金项目(批准号:CEPNT2022A01);辽宁省属本科高校基本科研业务费专项(批准号:20240177,20240215);沈阳市科技计划项目(批准号:22-322-3-34)资助。

摘　　要：无人机在协同对抗博弈上的应用越来越广泛和深入,尤其是无人机集群在协同探测、全域对抗、策略骗扰等对抗任务中,发挥着越来越重要作用,可靠高效的无人机集群博弈方法是当前的研究热点.本文将反事实基线思想引入到无人机集群对抗博弈环境,提出一种基于反事实多智能体策略梯度(counterfactual multi-agent policy gradients,COMA)的无人机集群对抗博弈方法;在具有无限连续状态、动作的无人机对抗环境中,基于无人机动力学模型,设置符合实际环境的击敌条件和奖励函数,构建基于多智能体深度强化学习的无人机集群对抗博弈模型.红蓝双方无人机集群采取不同的对抗博弈方法,利用多智能体粒子群环境(multi-agent particle environment,MPE)对红蓝双方无人机集群进行非对称性对抗实验,实验结果表明平均累积奖励能够收敛到纳什均衡,在解决4 vs.8的对抗决策问题方面,COMA方法的平均命中率较DQN和MADDPG分别提升39%和17%,在平均胜率方面比DQN和MADDPG分别提升34%和17%.最后,通过对COMA方法的收敛性和稳定性的深入分析,保证了COMA方法在无人机集群对抗博弈任务上的实用性和鲁棒性.The collaborative adversarial game of unmanned aerial vehicles(UAVs)is becoming increasingly widespread and profound,especially in collaborative detection,global confrontation,strategic deception and other confrontation tasks.Reliable and efficient UAV swarm game methods are currently a hot research topic.This paper introduces the counterfactual baseline concept into the UAV swarm adversarial environment and proposes a UAV swarm adversarial game method based on counterfactual multi-agent policy gradients(COMA).In the UAV confrontation environment with infinite continuous states and actions,merging the UAV dynamics,we set up realistic attack conditions and reward functions,and construct a UAV swarm adversarial game model based on multi-agent deep reinforcement learning.The red and blue UAVs adopt different adversarial game methods,and asymmetric adversarial experiments are conducted in the multi-agent particle environment(MPE).The experimental results show that the average cumulative rewards can converge to Nash equilibrium.For 4 vs.8 adversarial decision-making scene,the average hit rate of COMA is 39%and 17%higher than that of DQN and MADDPG,while the average win rate is 34%and 17%higher than that of DQN and MADDPG,respectively.Finally,the practicality and robustness for UAV swarm adversarial game tasks are ensured through an in-depth analysis of the convergence and stability of COMA.

关键词：无人机集群对抗博弈多智能体深度强化学习纳什均衡

分类号：TP18[自动化与计算机技术—控制理论与控制工程] V279[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入反事实基线的无人机集群对抗博弈方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入反事实基线的无人机集群对抗博弈方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索