基于LSTM-Dueling DQN的无人战斗机机动智能决策  被引量:5

Intelligent Maneuvering Decision of Unmanned Combat Aircraft Based on LSTM-Dueling DQN

在线阅读下载全文

作  者:胡东愿 杨任农[1] 左家亮[1] 郑万泽 赵雨 张强[1] Hu Dongyuan;Yang Rennong;Zuo Jialiang;Zheng Wanze;Zhao Yu;Zhang Qiang(Air Force Engineering University,Xi'an 710051,China)

机构地区:[1]空军工程大学,西安710051

出  处:《战术导弹技术》2021年第6期97-104,共8页Tactical Missile Technology

摘  要:针对无人作战飞机在一对一自主空战中无法实现智能决策的问题,引入深度强化学习方法,构建无人战斗机战术决策框架,求解智能体对抗的机动指令。首先,建立飞行运动模型和导弹攻击区模型,形成基本的一对一空战对抗环境。其次,利用8个运动变量来构建智能体连续的状态空间,并根据导弹攻击区实时计算结果设计奖惩函数,实现双机对抗决策。最后,使用长短期记忆网络和全连接网络相结合,构建智能体价值网络和目标网络。利用记忆库中的决策样本,对网络进行训练,完成值函数的拟合,实现智能体在任意状态下的决策。仿真试验表明,在典型的案例中,智能体能够有效感知空战场态势,算法给出的决策动作可以积累并保持无人作战飞机的空战优势,完成对目标的打击,决策时间能够满足时效性的要求。Aiming at the problem that unmanned combat aircraft cannot make intelligent decisions in oneto-one air combat,a deep reinforcement learning method is introduced to construct the tactical decision framework for unmanned combat aircraft and solve the maneuvering commands of agent confrontation.Firstly,the flight movement model and missile attack zone model are established to form a basic one-to-one confrontation environment.Secondly,eight motion variables are used to construct the continuous state space of the agent,and the reward and punishment functions are designed according to calculation results of the attack area.The decision making is realized under the engagement interaction between agent and combat environment.Finally,long-term short-term memory network(LSTM)and fully connected network are combined to build agent value network and target network.By using the decision samples in the memory bank,the network is trained to complete the fitting of the value function and realize the decision making for agents in any state.Simulation tests show that,in typical cases,agents can effectively perceive the situation of the confrontation,and the decision actions given by the algorithm can accumulate and maintain the superiority of unmanned combat aircraft in air combat,and complete the attack on the target.The decision time can meet the requirement of timeliness.

关 键 词:无人战斗机 空战对抗 机动决策 深度强化学习 值函数搜索 长短期记忆网络 

分 类 号:V325[航空宇航科学与技术—人机与环境工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象