检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王尔申 陈纪浩 宏晨 刘帆 陈艾东 景竑元 Ershen WANG;Jihao CHEN;Chen HONG;Fan LIU;Aidong CHEN;Hongyuan JING(School of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang 110136,China;School of Civil and Aviation,Shenyang Aerospace University,Shenyang 110136,China;Multi-Agent Systems Research Centre,Beijing Union University,Beijing 100101,China;College of Robotics,Beijing Union University,Beijing 100101,China)
机构地区:[1]沈阳航空航天大学电子信息工程学院,沈阳110136 [2]沈阳航空航天大学民用航空学院,沈阳110136 [3]北京联合大学多智能体系统研究中心,北京100101 [4]北京联合大学机器人学院,北京100101
出 处:《中国科学:信息科学》2024年第7期1775-1792,共18页Scientia Sinica(Informationis)
基 金:国家重点研发计划(批准号:2018AAA0100804);国家自然科学基金(批准号:62173237);北京联合大学科研(批准号:ZK30202304,SK160202103,ZK50201911,ZK30202107);卫星导航系统与装备技术国家重点实验室开放基金项目(批准号:CEPNT2022A01);辽宁省属本科高校基本科研业务费专项(批准号:20240177,20240215);沈阳市科技计划项目(批准号:22-322-3-34)资助。
摘 要:无人机在协同对抗博弈上的应用越来越广泛和深入,尤其是无人机集群在协同探测、全域对抗、策略骗扰等对抗任务中,发挥着越来越重要作用,可靠高效的无人机集群博弈方法是当前的研究热点.本文将反事实基线思想引入到无人机集群对抗博弈环境,提出一种基于反事实多智能体策略梯度(counterfactual multi-agent policy gradients,COMA)的无人机集群对抗博弈方法;在具有无限连续状态、动作的无人机对抗环境中,基于无人机动力学模型,设置符合实际环境的击敌条件和奖励函数,构建基于多智能体深度强化学习的无人机集群对抗博弈模型.红蓝双方无人机集群采取不同的对抗博弈方法,利用多智能体粒子群环境(multi-agent particle environment,MPE)对红蓝双方无人机集群进行非对称性对抗实验,实验结果表明平均累积奖励能够收敛到纳什均衡,在解决4 vs.8的对抗决策问题方面,COMA方法的平均命中率较DQN和MADDPG分别提升39%和17%,在平均胜率方面比DQN和MADDPG分别提升34%和17%.最后,通过对COMA方法的收敛性和稳定性的深入分析,保证了COMA方法在无人机集群对抗博弈任务上的实用性和鲁棒性.The collaborative adversarial game of unmanned aerial vehicles(UAVs)is becoming increasingly widespread and profound,especially in collaborative detection,global confrontation,strategic deception and other confrontation tasks.Reliable and efficient UAV swarm game methods are currently a hot research topic.This paper introduces the counterfactual baseline concept into the UAV swarm adversarial environment and proposes a UAV swarm adversarial game method based on counterfactual multi-agent policy gradients(COMA).In the UAV confrontation environment with infinite continuous states and actions,merging the UAV dynamics,we set up realistic attack conditions and reward functions,and construct a UAV swarm adversarial game model based on multi-agent deep reinforcement learning.The red and blue UAVs adopt different adversarial game methods,and asymmetric adversarial experiments are conducted in the multi-agent particle environment(MPE).The experimental results show that the average cumulative rewards can converge to Nash equilibrium.For 4 vs.8 adversarial decision-making scene,the average hit rate of COMA is 39%and 17%higher than that of DQN and MADDPG,while the average win rate is 34%and 17%higher than that of DQN and MADDPG,respectively.Finally,the practicality and robustness for UAV swarm adversarial game tasks are ensured through an in-depth analysis of the convergence and stability of COMA.
关 键 词:无人机集群 对抗博弈 多智能体 深度强化学习 纳什均衡
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] V279[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7