检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张青龙 赵斌[1] 许新鹏 ZHANG Qinglong;ZHAO Bin;XU Xinpeng(Northwestern Polytechnical University,Institute of Precision Guidance and Control,Xi’an 710072,China)
机构地区:[1]西北工业大学精确制导与控制研究所,西安710072
出 处:《宇航学报》2024年第8期1281-1289,共9页Journal of Astronautics
基 金:国家自然科学基金(62373307);中央高校基本科研业务费(G2022KY0608)。
摘 要:针对红外制导导弹拦截机动目标的导引律设计问题,提出了一种纯角度量测下考虑视场角约束的深度强化学习制导方法。首先,将拦截制导问题转化为一个马尔可夫决策过程,建立了基于双延迟深度确定性策略梯度算法的深度强化学习制导模型,并充分考虑了导弹一阶自动驾驶仪特性;其次,设计了一种满足导引头视场角约束,同时又能权衡能量消耗和制导精度的综合奖励函数,并在典型场景下进行了深度强化学习制导律训练。在目标采用不同机动形式的条件下进行了对比仿真与蒙特卡洛仿真。仿真结果表明,该方法采用红外导引头探测到的纯角度信息,能够在满足视场角约束、过载指令饱和约束的前提下以较高精度命中目标,同时对目标的不同机动方式具有较强的鲁棒性。A deep reinforcement learning guidance method is proposed to address the problem of guidance law design for intercepting maneuverable targets with infrared-guided missiles,taking into consideration pure angle measurements and field-of-view angle constraints.Firstly,the interception guidance problem is formulated as a Markov Decision Process.A deep reinforcement learning guidance model is established based on the double delay deep deterministic policy gradient(TD3)algorithm,giving thorough consideration to the first-order autopilot characteristics of the missile.Secondly,a comprehensive reward function is designed to consider the field-of-view angle constraints of the passive seeker while balancing energy consumption and guidance accuracy,and the guidance law of deep reinforcement learning is trained in a variety of typical scenarios.The comparison simulation and Monte Carlo simulation are carried out under the condition of different maneuvering modes of the target.The simulation results show that through the method,the mssile can hit the target with high accuracy under the premise of meeting the constraint of the field-of-view angle and the constraint of overload instruction saturation by using the pure angle information detected by the infrared seeker.Meanwhile,it has strong robustness to different maneuvering modes of the target.
关 键 词:深度强化学习 机动目标 视场约束 纯角度量测 红外制导 导弹拦截
分 类 号:V448.13[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33