被动探测视场角约束下的深度强化学习制导方法  

Deep Reinforcement Learning Guidance Method Considering the Field-of-view Angle Constraint of Passive Detection

在线阅读下载全文

作  者:张青龙 赵斌[1] 许新鹏 ZHANG Qinglong;ZHAO Bin;XU Xinpeng(Northwestern Polytechnical University,Institute of Precision Guidance and Control,Xi’an 710072,China)

机构地区:[1]西北工业大学精确制导与控制研究所,西安710072

出  处:《宇航学报》2024年第8期1281-1289,共9页Journal of Astronautics

基  金:国家自然科学基金(62373307);中央高校基本科研业务费(G2022KY0608)。

摘  要:针对红外制导导弹拦截机动目标的导引律设计问题,提出了一种纯角度量测下考虑视场角约束的深度强化学习制导方法。首先,将拦截制导问题转化为一个马尔可夫决策过程,建立了基于双延迟深度确定性策略梯度算法的深度强化学习制导模型,并充分考虑了导弹一阶自动驾驶仪特性;其次,设计了一种满足导引头视场角约束,同时又能权衡能量消耗和制导精度的综合奖励函数,并在典型场景下进行了深度强化学习制导律训练。在目标采用不同机动形式的条件下进行了对比仿真与蒙特卡洛仿真。仿真结果表明,该方法采用红外导引头探测到的纯角度信息,能够在满足视场角约束、过载指令饱和约束的前提下以较高精度命中目标,同时对目标的不同机动方式具有较强的鲁棒性。A deep reinforcement learning guidance method is proposed to address the problem of guidance law design for intercepting maneuverable targets with infrared-guided missiles,taking into consideration pure angle measurements and field-of-view angle constraints.Firstly,the interception guidance problem is formulated as a Markov Decision Process.A deep reinforcement learning guidance model is established based on the double delay deep deterministic policy gradient(TD3)algorithm,giving thorough consideration to the first-order autopilot characteristics of the missile.Secondly,a comprehensive reward function is designed to consider the field-of-view angle constraints of the passive seeker while balancing energy consumption and guidance accuracy,and the guidance law of deep reinforcement learning is trained in a variety of typical scenarios.The comparison simulation and Monte Carlo simulation are carried out under the condition of different maneuvering modes of the target.The simulation results show that through the method,the mssile can hit the target with high accuracy under the premise of meeting the constraint of the field-of-view angle and the constraint of overload instruction saturation by using the pure angle information detected by the infrared seeker.Meanwhile,it has strong robustness to different maneuvering modes of the target.

关 键 词:深度强化学习 机动目标 视场约束 纯角度量测 红外制导 导弹拦截 

分 类 号:V448.13[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象