基于自博弈强化学习的异构无人机集群协同对抗决策方法被引量：1

Cooperative decision-making for heterogeneous UAV swarm confrontation based on self-play reinforcement learning

作　　者：严锐驰李帅王晨[1,2] 吴琦孙基男张世琨[2] 谢广明[1,4] Ruichi YAN;Shuai LI;Chen WANG;Qi WU;Jinan SUN;Shikun ZHANG;Guangming XIE(Intelligent Biomimetic Design Lab,College of Engineering,Peking University,Beijing 100871,China;National Engineering Research Center of Software Engineering,Peking University,Beijing 100871,China;Institute of the Third Academy,China Aerospace Science&Industry Corporation,Beijing 100074,China;Center for Multi-Agent Research,Institute for Artificial Intelligence,Peking University,Beijing 100871,China)

机构地区：[1]北京大学工学院智能仿生设计实验室,北京100871 [2]北京大学软件工程国家工程研究中心,北京100871 [3]中国航天科工集团第三研究院,北京100074 [4]北京大学人工智能研究院多智能体研究中心,北京100871

出　　处：《中国科学：信息科学》2024年第7期1709-1729,共21页Scientia Sinica(Informationis)

基　　金：国家自然科学基金(批准号:U22A2062,12272008,61973007)资助项目。

摘　　要：随着无人机技术的发展,无人机集群对抗已成为国内外研究热点.现有决策算法的研究主要集中于同构无人机集群对抗场景,且当应用于更复杂对抗场景时,存在奖励函数设计难度大、决策实时性难以满足等问题.为此,本文针对异构无人机集群对抗的实时机动决策问题展开研究.首先,构建了一个长机–僚机异构无人机集群的对抗仿真环境,其中,长机和僚机具有不同的机动和攻击能力,且对胜负具有不同影响力.其次,本文提出了一种基于多智能体强化学习的分布式无人机集群协同机动控制算法,并设计了一套结合课程学习和自博弈的策略训练与优化方法.通过设计简单的稀疏奖励结合课程学习方法即可学到异构无人机集群协同机动策略;引入自博弈对抗方式,使得对手无人机的策略更具针对性,以提升对抗的强度,从而进一步优化机动策略,使其更贴近实际需求.最后,仿真验证了本文所提方法的有效性和可扩展性.With the development of unmanned aerial vehicle(UAV)technology,UAV swarm confrontation has become a research hotspot at home and abroad.The existing decision-making algorithms mainly focus on the scenario of homogeneous UAV swarm confrontation.When facing complex adversarial environments,these methods encounter challenges,such as difficulty in designing reward functions and the inability to meet realtime decision-making requirements.To this end,this paper focuses on the real-time maneuver decision-making problem in heterogeneous UAV swarm combat.First,we construct an adversarial simulation environment for a leader-follower heterogeneous UAV swarm,where the leader and follower UAVs possess different maneuvering and attacking capabilities,and their outcomes have varying impacts on victory.Second,we propose a distributed UAV swarm cooperative maneuver control algorithm based on multi-agent reinforcement learning,and design a training and optimization approach combining curriculum learning and self-play.By designing simple sparse rewards combined with curriculum learning,we can get cooperative maneuver strategies for the heterogeneous UAV swarm.Introducing self-play adversarial mode makes opponents’UAV strategies more targeted,enhancing the intensity of combat and further optimizing maneuver strategies to better align with practical requirements.Last,the effectiveness and scalability of our proposed methods are validated through simulations.

关键词：集群对抗协同决策自博弈多智能体强化学习无人机

分类号：TP18[自动化与计算机技术—控制理论与控制工程] V279[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自博弈强化学习的异构无人机集群协同对抗决策方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自博弈强化学习的异构无人机集群协同对抗决策方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于自博弈强化学习的异构无人机集群协同对抗决策方法被引量：1