基于强化学习的多发导弹协同攻击智能制导律被引量：24

Reinforcement Learning-based Intelligent Guidance Law for Cooperative Attack of Multiple Missiles

作　　者：陈中原韦文书[2] 陈万春[1] CHEN Zhongyuan;WEI Wenshu;CHEN Wanchun(School of Astronautics, Beihang University, Beijing 100191, China;China Academy of Launch Vehicle Technology, Beijing 100076, China)

机构地区：[1]北京航空航天大学宇航学院,北京100191 [2]中国运载火箭技术研究院,北京100076

出　　处：《兵工学报》2021年第8期1638-1647,共10页Acta Armamentarii

基　　金：2021年度“卓越百人”博士后支持计划项目(B21042);国防基础科研计划项目(JCKY2019204D001)。

摘　　要：为实现多发导弹对目标的协同攻击,提升打击效能,提出一种基于深度确定性策略梯度下降神经网络的强化学习协同制导律。修正了基于线性交战动力学的剩余飞行时间估计方程,不再受小角度假设的约束,进而提高剩余飞行时间估计精度。以各弹的剩余飞行时间误差为协调变量,与各弹的剩余飞行距离一同作为强化学习算法的观测量。利用脱靶量和剩余飞行时间误差构造奖励函数,离线训练生成强化学习智能体。闭环制导过程中,强化学习智能体将实时生成可实现同时打击的制导指令。仿真结果表明:该强化学习制导律能够实现多发导弹对目标的同时攻击;与传统协同制导律相比,强化学习协同制导律的脱靶量较小,攻击时间误差也较小。A reinforcement learning-based cooperative guidance law utlitizing a deep deterministic policy gradient descent neural network is proposed to achieve the cooperative attack of multiple missiles against a target and improve the attack effectiveness.The estimation equation of time-to-go based on the linear engagement dynamics is revised to improve the estimation accuracy of time-to-go,which is no longer restricted by the assumption of small angle.The time-to-go error of each missile is regarded as the coordination variable.The time-to-go error and range-to-go of each missile are used as the observables of the reinforcement learning algorithm.The reward function is constructed by using miss distance and time-to-go error,and then a reinforcement learning agent is generated by offline training.In the process of closed-loop guidance,the reinforcement learning agent generates guidance commands in real time,by that simultaneous attack can be achieved.Simulated results verify that the proposed reinforcement learning guidance law can achieve simultaneous attack on the target.Compared with the traditional cooperative guidance law,the reinforcement learning cooperative guidance law can be used to obtain smaller miss distances and smaller attack time errors.

关键词：导弹协同制导律同时攻击强化学习深度确定性策略梯度下降算法

分类号：TJ765.31[兵器科学与技术—武器系统与运用工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多发导弹协同攻击智能制导律被引量：24

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多发导弹协同攻击智能制导律 被引量：24

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的多发导弹协同攻击智能制导律被引量：24