检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yaoming ZHOU Fan YANG Chaoyue ZHANG Shida LI Yongchao WANG
机构地区:[1]School of Aeronautic Science and Engineering,Beihang University,Beijing 100191,China [2]Key Laboratory of Industrial Control Technology,Institute of Cyber-Systems and Control,Zhejiang University,Hangzhou 310027,China
出 处:《Chinese Journal of Aeronautics》2024年第8期311-328,共18页中国航空学报(英文版)
基 金:co-supported by the National Natural Science Foundation of China(No.52272382);the Aeronautical Science Foundation of China(No.20200017051001);the Fundamental Research Funds for the Central Universities,China.
摘 要:Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms.
关 键 词:Unmanned combat aerial vehicle(UCAV)formation DECISION-MAKING Beyond-visual-range(BVR)air combat Advantage highlight Multi-agent reinforcement learning(MARL)
分 类 号:V279[航空宇航科学与技术—飞行器设计] V249.1
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.185.32