基于分层强化学习的无人机空战多维决策  被引量:10

Multi-Dimensional Decision-Making for UAV Air Combat Based on Hierarchical Reinforcement Learning

在线阅读下载全文

作  者:张建东[1] 王鼎涵 杨啟明 史国庆[1] 陆屹 张耀中[1] ZHANG Jiandong;WANG Dinghan;YANG Qiming;SHI Guoqing;LU Yi;ZHANG Yaozhong(School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710072,Shaanxi,China;AVIC Shenyang Aircraft Design and Research Institute,Shenyang 110035,Liaoning,China)

机构地区:[1]西北工业大学电子信息学院,陕西西安710072 [2]沈阳飞机设计研究所,辽宁沈阳110035

出  处:《兵工学报》2023年第6期1547-1563,共17页Acta Armamentarii

基  金:陕西省自然科学基础研究计划项目(2022JQ-593);陕西省科技厅重点研发计划项目(2022GY-089)。

摘  要:针对无人机空战过程中面临的智能决策问题,基于分层强化学习架构建立无人机智能空战的多维决策模型。将空战自主决策由单一维度的机动决策扩展到雷达开关、主动干扰、队形转换、目标探测、目标追踪、干扰规避、武器选择等多个维度,实现空战主要环节的自主决策;为解决维度扩展后决策模型状态空间复杂度、学习效率低的问题,结合Soft Actor-Critic算法和专家经验训练和建立元策略组,并改进传统的Option-Critic算法,设计优化策略终止函数,提高策略的切换的灵活性,实现空战中多个维度决策的无缝切换。实验结果表明,该模型在无人机空战全流程的多维度决策问题中具有较好的对抗效果,能够控制智能体根据不同的战场态势灵活切换干扰、搜索、打击、规避等策略,达到提升传统算法性能和提高解决复杂决策效率的目的。To solve the intelligent decision-making problem in the process of UAV air combat,a multi-dimensional decision-making model for UAV intelligent air combat based on the hierarchical reinforcement learning architecture is established,allowing the autonomous decision-making of air combat to be extended from a single-dimensional maneuver decision to a multi-dimensional one including radar switch,active jamming,formation conversion,target detection,target tracking,interference avoidance,weapon selection,etc.,so that autonomous decision-making in the main steps of air combat is realized.In order to solve the problems of state-space complexity and low learning efficiency of the decision-making model after the dimension expansion,a meta-strategy group is trained and established with the Soft Actor-Critic algorithm and expert experience,and the traditional Option-Critic algorithm is improved.The strategy termination function is designed and optimized to improve the flexibility of strategy switching and realize seamless multi-dimensional decision-making switching in air combat..The experimental results show that the proposed method has good countermeasure effectiveness for the multi-dimensional decision-making during the whole process of UAV air combat,which can control the agent to flexibly switch among interference,search,strike,and avoidance strategies according to different battlefield situations with the purpose of improving the performance of traditional algorithms and the efficiency of solving complex decision-making processes.

关 键 词:无人机空战 多维决策 分层强化学习 Soft Actor-Critic算法 Option-Critic算法 

分 类 号:V279[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象