检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:秦湖程 黄炎焱[1] 陈天德 张寒 QIN Hucheng;HUANG Yanyan;CHEN Tiande;ZHANG Han(School of Automation,Nanjing University of Science and Technology,Nanjing 210094,China)
机构地区:[1]南京理工大学自动化学院,江苏南京210094
出 处:《系统工程与电子技术》2024年第11期3764-3773,共10页Systems Engineering and Electronics
基 金:中船创新基金(KJB2023012)资助课题。
摘 要:针对高动态战场态势下防御作战场景中的多目标火力规划问题,提出一种基于近端策略优化算法的火力规划方法,以最大化作战效能为目标,从弹药消耗、作战效果、作战成本及作战时间4个方面设计强化学习奖励函数。考虑历史决策序列对当前规划的影响,以长短期记忆网络(long short-term memory,LSTM)为核心,基于Actor-Critic框架设计神经网络,使用近端策略优化算法训练网络,利用训练好的强化学习智能体进行序贯决策,根据多个决策阶段的态势实时生成一系列连贯火力规划方案。仿真结果表明,智能体能够实现高动态态势下多目标火力规划,其计算效率相对于其他算法具有更明显的优势。To solve the problem of multi-target firepower planning in defensive combat scenarios under high dynamic battlefield situation,a firepower planning method based on the proximal strategy optimization algorithm is proposed.With the goal of maximizing combat effectiveness,the reinforcement learning reward function is designed from four aspects:ammunition consumption,combat effect,combat cost and combat time.Considering the influence of historical decision sequence on the current planning,the neural network is designed based on the Actor-Critic framework with the long short-term memory network(LSTM)as the core.The network is trained by the proximal strategy optimization algorithm,and the trained reinforcement learning agent is used for sequential decision-making.A series of coherent fire planning schemes are generated in real time according to the situation of multiple decision-making stages.Simulation results show that the agent can realize multi-target firepower planning under high dynamic situation,and its computational efficiency has more obvious advantages than other algorithms.
关 键 词:多目标火力规划 近端策略优化算法 长短期记忆网络 序贯决策
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7