检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖友刚[1] 金升成 毛晓 伍国华 陆志沣 XIAO You-gang;JIN Sheng-cheng;MAO Xiao;WU Guo-hua;LU Zhi-feng(School of Traffic&Transportation Engineering,Central South University,Changsha Hunan 410018,China;Shanghai Academy of Spaceflight Technology,Shanghai 201109,China)
机构地区:[1]中南大学交通运输工程学院,湖南长沙410018 [2]上海机电工程研究所,上海201109
出 处:《控制理论与应用》2024年第6期990-998,共9页Control Theory & Applications
摘 要:针对对抗环境下的海上舰船防空反导导弹目标分配问题,本文提出了一种融合注意力机制的深度强化学习算法.首先,构建了舰船多类型导弹目标分配模型,并结合目标多波次拦截特点将问题建模为马尔可夫决策过程.接着,基于编码器–解码器框架搭建强化学习策略网络,融合多头注意力机制对目标进行编码,并在解码中结合整体目标和单个目标编码信息实现舰船可靠的导弹目标分配.最后,对导弹目标分配收益、分配时效以及策略网络训练过程进行了仿真实验.实验结果表明,本文方法能生成高收益的导弹目标分配方案,相较于对比算法的大规模决策计算速度提高10%~94%,同时其策略网络能够快速稳定收敛.To effectively solve the missile-target allocation problem of the naval ship in the case of confrontation,this study proposes a deep reinforcement learning algorithm combining attention mechanism.First,we construct a mathematical model for multi-type missiles of the naval ship and design the Markov decision-making process considering the situation of multi-times target interception.After that,the policy network is constructed based on the encoder-decoder architecture,in which targets are encoded combined with the multi-head attention mechanism and the reasonable missile-target allocation scheme is generated in the decoder according to integrated global and local embedding information.Finally,we conduct simulation experiments are carried out on the profit of missile-target allocation schemes,computation time,and the training process of the policy network.The experimental results show that our algorithm can engender missile-target allocation schemes with higher profit compared to baselines,the computation time in large-scale problems is reduced by 10%∼94%,and it converges fast and stably.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.189.141.66