基于DL-MCTS的超视距空战战术智能决策方法研究  

Research on intelligent tactical decision-making method of beyond-visual-range air combat based on DL-MCTS

作  者:宋祺 左家亮[1] 张滢 闫孟达 吴傲 李乐言 SONG Qi;ZUO Jialiang;ZHANG Ying;YAN Mengda;WU Ao;LI Leyan(Air Traffic Control and Navigation School,Air Force Engineering University,Xi’an 710051,China)

机构地区:[1]空军工程大学空管领航学院,西安710051

出  处:《兵器装备工程学报》2025年第2期145-156,共12页Journal of Ordnance Equipment Engineering

摘  要:现有超视距空战智能决策研究多侧重于机动决策,而战术决策研究较少。针对机动决策难理解、战术决策难生成的问题,提出了一种融合深度学习(DL)和蒙特卡洛搜索(MCTS)的算法,通过构建空战智能体自主学习和决策框架,融合智能体的离线战术学习和在线战术决策,实现了一种基于DL-MCTS的超视距空战战术决策方法。在离线学习阶段,利用神经网络学习先验战术规划数据集,包含感知数据集、策略数据集和评估数据集,并为智能体构建感知器、规划器和评估器3种功能模块。在实时对抗阶段,提出战术感知和决策双线并行处理模式,建立对抗博弈树。利用蒙特卡洛搜索方法融合智能体3种网络,在每个博弈节点上实现选择、扩展、仿真和信息回溯,实时搜索当前态势的最优策略。在迎头攻击任务实验中,离线训练后的智能体具备基本的决策能力,经过50次循环迭代搜索后,智能体能够消除对手的首发导弹优势,并逐步获取自身导弹发射条件。实验结果表明该战术决策方法的决策结果可解释性强、决策速度较满意。Existing intelligent decision-making research in beyond-visual-range(BVR)air combat mostly focuses on maneuvering decision-making,while there is less research on tactical decision-making.To address the issues of difficult maneuvering decision-making comprehension and challenging tactical decision-making generation,an algorithm integrating deep learning(DL)and Monte Carlo Tree Search(MCTS)is proposed.By constructing an autonomous learning and decision-making framework for air combat agents,integrating the agents’offline tactical learning and online tactical decision-making,a BVR air combat tactical decision-making method based on DL-MCTS is realized.In the offline stage,historical engagement data and tactical theoretical knowledge are used to build a tactical database,including perception data sets,decision-making data sets,and evaluation data sets.Moreover,three functional modules of perceptron,planner and evaluator for the agent is constructed and trained with deep neural networks based on the data-base.In the real-time confrontation stage,two parallel modes are designed for perception and decision-making timeline independently.The Monte Carlo search method is introduced to fuse the three networks of the agent to realize selection,expansion,simulation and information backtracking at each node.The optimal strategies are searched and updated with DL-MCTS in real-time.Finally,experiments show that the agent after offline training has basic decision-making capabilities.In a head-on attack mission,after 50 cycles iterative search,the agent can eliminate the adversary’s first missile advantage and gradually acquire its own missile launch conditions.The experimental results demonstrate that the decision-making outcomes of this tactical decision-making method exhibit strong interpretability,and the decision-making speed is satisfactory.

关 键 词:超视距空战 战术决策 智能决策 深度学习 蒙特卡洛树搜索 

分 类 号:V323[航空宇航科学与技术—人机与环境工程] E926[军事—军事装备学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象