基于PPO算法的集群多目标火力规划方法  

Cluster multi-target fire planning method based on PPO algorithm

在线阅读下载全文

作  者:秦湖程 黄炎焱[1] 陈天德 张寒 QIN Hucheng;HUANG Yanyan;CHEN Tiande;ZHANG Han(School of Automation,Nanjing University of Science and Technology,Nanjing 210094,China)

机构地区:[1]南京理工大学自动化学院,江苏南京210094

出  处:《系统工程与电子技术》2024年第11期3764-3773,共10页Systems Engineering and Electronics

基  金:中船创新基金(KJB2023012)资助课题。

摘  要:针对高动态战场态势下防御作战场景中的多目标火力规划问题,提出一种基于近端策略优化算法的火力规划方法,以最大化作战效能为目标,从弹药消耗、作战效果、作战成本及作战时间4个方面设计强化学习奖励函数。考虑历史决策序列对当前规划的影响,以长短期记忆网络(long short-term memory,LSTM)为核心,基于Actor-Critic框架设计神经网络,使用近端策略优化算法训练网络,利用训练好的强化学习智能体进行序贯决策,根据多个决策阶段的态势实时生成一系列连贯火力规划方案。仿真结果表明,智能体能够实现高动态态势下多目标火力规划,其计算效率相对于其他算法具有更明显的优势。To solve the problem of multi-target firepower planning in defensive combat scenarios under high dynamic battlefield situation,a firepower planning method based on the proximal strategy optimization algorithm is proposed.With the goal of maximizing combat effectiveness,the reinforcement learning reward function is designed from four aspects:ammunition consumption,combat effect,combat cost and combat time.Considering the influence of historical decision sequence on the current planning,the neural network is designed based on the Actor-Critic framework with the long short-term memory network(LSTM)as the core.The network is trained by the proximal strategy optimization algorithm,and the trained reinforcement learning agent is used for sequential decision-making.A series of coherent fire planning schemes are generated in real time according to the situation of multiple decision-making stages.Simulation results show that the agent can realize multi-target firepower planning under high dynamic situation,and its computational efficiency has more obvious advantages than other algorithms.

关 键 词:多目标火力规划 近端策略优化算法 长短期记忆网络 序贯决策 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象