基于图卷积深度强化学习的协同空战机动决策方法被引量：3

Collaborative air combat maneuvering decision-making method based on graph convolutional deep reinforcement learning

作　　者：欧洋郭正玉[2] 罗德林缪克华 OU Yang;GUO Zhengyu;LUO Delin;MIAO Kehua(School of Aeronautics and Astronautics,Xiamen University,Xiamen 361102,China;China Air-to-Air Missile Research Institute,Luoyang 471000,China;National Key Laboratory of Space-Based Information Perception and Integration,Luoyang 471000,China)

机构地区：[1]厦门大学航空航天学院,厦门361102 [2]中国空空导弹研究院,洛阳471000 [3]空基信息感知与融合全国重点实验室,洛阳471000

出　　处：《工程科学学报》2024年第7期1227-1236,共10页Chinese Journal of Engineering

基　　金：厦门市科技局-厦门市产学研项目(2023CXY0101);空基信息感知与融合全国重点实验室与航空科学基金联合资助项目(20220001068001)。

摘　　要：针对多无人机智能协同空战对抗决策问题,提出了一种基于长短期记忆与竞争图卷积深度强化学习的多机协同空战机动对抗决策方法.首先,对多机协同空战对抗问题进行描述;其次,在竞争Q网络中,引入长短期记忆网络用于处理带有强时序相关性的空战信息,接着,搭建图卷积网络作为多机之间的通信基础,提出基于长短期记忆与竞争图卷积深度强化学习算法的协同空战训练框架,并对协同空战决策训练算法进行了设计.二对一空战仿真结果验证了本文所提出的协同智能对抗决策方法的有效性,其具有决策速度快、学习过程稳定的特点以及适应空战环境快速变化下的协同策略学习能力.The effective implementation of multi-unmanned aerial vehicle(UAV)decision making and improvement in the efficiency of coordinated mission execution are currently the top priorities of air combat research.To solve the problem of multi-UAV cooperative air combat maneuvering confrontation,a multi-UAV cooperative air combat maneuvering confrontation decision-making method based on long short-term memory(LSTM)and convolutional deep reinforcement learning of competitive graphs is proposed.First,the problem of multi-UAV cooperative air combat maneuvering confrontation is described.Second,in the deep dueling Q network,the LSTM network is introduced to process air combat information with a strong temporal correlation.Further,a graph convolutional network is built as a communication basis between multiple UAVs and a cooperative air combat training framework based on LSTM,and a convolutional deep reinforcement learning algorithm for the dueling graph is proposed to improve the convergence.In the proposed method,the communication problem between UAVs is transformed into a graph model,where each UAV is regarded as a node,and the observation state of the UAV is regarded as the attribute of a node.The convolutional layer captures the cooperative relationship between each node,and communication between UAVs is realized through information sharing.Subsequently,the extracted air combat feature information with time sequence is inputted into the LSTM and deep dueling Q networks for evaluating action values.The LSTM network can process sequence information and encode historical states into the hidden state of the network so that the network can better capture temporal dependencies and thus predict the value function of the current state better.The simulation results show that when the opponent adopts a nonmaneuvering strategy,the UAV formation developed using the proposed method as the core decision-making strategy can learn a reasonable maneuvering strategy and cooperate to a certain extent when facing an opponent using a fixed st

关键词：无人机深度强化学习机动决策多机协同空战决策

分类号：TG142.71[一般工业技术—材料科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图卷积深度强化学习的协同空战机动决策方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图卷积深度强化学习的协同空战机动决策方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于图卷积深度强化学习的协同空战机动决策方法被引量：3