基于深度强化学习的智能空战决策与仿真  被引量:16

Intelligent air combat decision making and simulation based on deep reinforcement learning

在线阅读下载全文

作  者:周攀 黄江涛 章胜 刘刚[2] 舒博文 唐骥罡 ZHOU Pan;HUANG Jiangtao;ZHANG Sheng;LIU Gang;SHU Bowen;TANG Jigang(Aerospace Technology Institute,China Aerodynamics Research and Development Center,Mianyang 621000,China;China Aerodynamics Research and Development Center,Mianyang 621000,China;School of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China)

机构地区:[1]中国空气动力研究与发展中心空天技术研究所,绵阳621000 [2]中国空气动力研究与发展中心,绵阳621000 [3]西北工业大学航空学院,西安710072

出  处:《航空学报》2023年第4期94-107,共14页Acta Aeronautica et Astronautica Sinica

基  金:省部级项目。

摘  要:飞行器空战智能决策是当今世界各军事强国的研究热点。为解决近距空战博弈中无人机的机动决策问题,提出一种基于深度强化学习方法的无人机近距空战格斗自主决策模型。决策模型中,采取并改进了一种综合考虑攻击角度优势、速度优势、高度优势和距离优势的奖励函数,改进后的奖励函数避免了智能体被敌机诱导坠地的问题,同时可以有效引导智能体向最优解收敛。针对强化学习中随机采样带来的收敛速度慢的问题,设计了基于价值的经验池样本优先度排序方法,在保证算法收敛的前提下,显著加快了算法收敛速度。基于人机对抗仿真平台对决策模型进行验证,结果表明智能决策模型能够在近距空战过程中压制专家系统和驾驶员。Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today.To solve the problem of Unmanned Aerial Vehicle(UAV)maneuvering decision-making in the close-range air combat game,an autonomous decision-making model based on deep reinforcement learning is proposed,where a reward function comprehensively considering the attack angle advantage,speed advantage,altitude advantage and distance advantage is adopted and improved.The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft,and can effectively guide the agent to converge to the optimal solution.Aiming at the problem of slow convergence caused by random sampling in reinforcement learning,we design a value-based pri⁃oritization method for experience pool samples.Under the premise of ensuring the algorithm convergence,the conver⁃gence speed of the algorithm is significantly accelerated.The decision-making model is verified based on the humanmachine confrontation simulation platform,and the results show that the model can suppress the expert system and the driver in the process of close air combat.

关 键 词:空战 自主决策 深度强化学习 TD3算法 稀疏奖励 

分 类 号:V249.12[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象