检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李文韬 方峰 王振亚 朱奕超 彭冬亮[1] LI Wentao;FANG Feng;WANG Zhenya;ZHU Yichao;PENG Dongliang(School of Automation,Hangzhou Dianzi University,Hangzhou 310018,China;China Academy of Aerospace Science and Innovation,Beijing 100076,China)
机构地区:[1]杭州电子科技大学自动化学院,杭州310018 [2]中国航天科技创新研究院,北京100076
出 处:《航空学报》2024年第17期214-228,共15页Acta Aeronautica et Astronautica Sinica
基 金:浙江省属高校基本科研业务费专项资金(GK209907299001-021)。
摘 要:针对局部信息可观测的双机编队空战协同奖励难以量化设计、智能体协同效率低、机动决策效果欠佳的问题,提出了一种引入混合超网络改进多智能体深度确定性策略梯度(MADDPG)的空战机动决策方法。采用集中式训练-分布式执行架构,满足单机智能体在局部观测数据下对于全局最优机动决策的训练需求。在为各单机设计兼顾局部快速引导和全局打击优势的奖励函数基础上,引入混合超网络将各单机估计的Q值进行单调非线性混合得到双机协同的全局策略Q值,指导分布式Actor网络更新参数,解决多智能体深度强化学习中信度分配难的问题。大量仿真结果表明,相较于典型的MADDPG方法,该方法能够更好地引导各单机做出符合全局协同最优的机动决策指令,且拥有更高的对抗胜率。In the case of two Unmanned Combat Aerial Vehicles(UCAVs)cooperative air combat with local observa⁃tion,there are the problems such as hard-to-design collaborative rewards,low collaboration efficiency and poor decision-making effect.To solve these problems,an intelligent maneuver decision-making method is proposed based on the improved the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)with hybrid hyper network.A Central⁃ized Training with Decentralized Execution(CTDE)architecture is adopted to meet the training requirements of global coordinated maneuvering decision in the situation of single agent with local observation.A reward function is designed for the UCAV agent by considering both the local reward of fast guidance for obtaining attack advantage and the global reward for winning air combat.Then,a hybrid hyper network is introduced to mix the estimated Q values of each agent monotonically and nonlinearly to obtain the global policy value function.By using the global policy value func⁃tion,the decentralized Actor network update parameters to solve the problem of credit assignment in multi-agent deep reinforcement learning.Simulation results show that compared with the traditional MADDPG method,the proposed method can produce the optimal global cooperative maneuver commands for achieving better coordination perfor⁃mance,and can obtain a higher winning rate with the same agent opponent.
关 键 词:无人作战飞机 空战机动决策 多智能体深度确定性策略梯度(MADDPG) 混合超网络 集中式训练-分布式执行
分 类 号:V249.12[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.250.24