检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Jingyu Zhu Minchi Kuang Wenqing Zhou Heng Shi Jihong Zhu Xu Han
机构地区:[1]Department of Precision Instruments,Tsinghua University,Beijing 100084,China [2]Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China [3]Chengdu Aircraft Design&Research Institute,Aviation Industry Corporation of China,Chengdu 610000,China
出 处:《Defence Technology(防务技术)》2024年第4期295-312,共18页Defence Technology
摘 要:Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem of plasticity loss in neural networks.Plasticity loss is the difficulty of learning new knowledge after the network has converged.To this end,we propose a motivational curriculum learning distributed proximal policy optimization(MCLDPPO)algorithm,through which trained agents can significantly outperform the predictive game tree and mainstream reinforcement learning methods.The motivational curriculum learning is designed to help the agent gradually improve its combat ability by observing the agent's unsatisfactory performance and providing appropriate rewards as a guide.Furthermore,a complete tactical maneuver is encapsulated based on the existing air combat knowledge,and through the flexible use of these maneuvers,some tactics beyond human knowledge can be realized.In addition,we designed an interruption mechanism for the agent to increase the frequency of decisionmaking when the agent faces an emergency.When the number of threats received by the agent changes,the current action is interrupted in order to reacquire observations and make decisions again.Using the interruption mechanism can significantly improve the performance of the agent.To simulate actual air combat better,we use digital twin technology to simulate real air battles and propose a parallel battlefield mechanism that can run multiple simulation environments simultaneously,effectively improving data throughput.The experimental results demonstrate that the agent can fully utilize the situational information to make reasonable decisions and provide tactical adaptation in the air combat,verifying the effectiveness of the algorithmic framework proposed in this paper.
关 键 词:Air combat MCLDPPO Interruption mechanism Digital twin Distributed system
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222