基于无量纲模型的空地导弹强化学习制导律  

Reinforcement Learning-Based Terminal Constrained Guidance Law for Air-to-Ground Missiles Based on Dimensionless Models

在线阅读下载全文

作  者:黄晓阳 周军[1] 赵斌[1] 许新鹏 沈昱恒 HUANG Xiaoyang;ZHOU Jun;ZHAO Bin;XU Xinpeng;SHEN Yuheng(Institute of Precision Guidance and Control,Northwestern Polytechnical University,Xi’an 710072,China;Shanghai Electro-Mechanical Engineering Institute,Shanghai 201109,China)

机构地区:[1]西北工业大学精确制导与控制研究所,西安710072 [2]上海机电工程研究所,上海201109

出  处:《宇航学报》2024年第9期1445-1455,共11页Journal of Astronautics

基  金:国家自然科学基金(62373307);中央高校基本科研业务费项目(G2022KY0608)。

摘  要:针对空地导弹对地打击的终端角度约束制导问题,提出了一种基于无量纲模型和终端奖励的强化学习末制导方法。首先,基于导弹飞行运动学方程建立了无量纲弹目相对运动模型,降低了强化学习环境状态空间和观测空间规模,有效提高了终端角度约束制导的强化学习网络训练效率;其次,综合考虑终端命中精度和终端攻击角度精度,不依赖过程奖励函数,构建了基于终端奖励的强化学习策略,避免了传统强化学习制导过程中存在的奖励稀疏问题;第三,采用深度确定性策略梯度算法,在典型场景下完成了考虑输入优化的末制导律训练。数学仿真实验表明,所述方法相比现有方法具有更高的命中精度和攻击角度精度,显著降低需用过载,能够有效克服现有强化学习制导方法中存在的计算资源占用高、学习效率低的问题,充分体现了其潜在的应用价值。To tackle the terminal angle guidance conundrum in air-to-ground missile strikes,a reinforcement learning approach based on dimensionless modeling and terminal rewards is presented.Through establishing a dimensionless model from the flight dynamics of missiles,this method shrinks the size of the state and observation space in the reinforcement learning environment,enhancing the training efficiency for angle-constrained guidance.It adopts a reinforcement strategy based on terminal rewards that takes into account the accuracy of hits and attack angles,circumventing the reward sparsity problem in conventional reinforcement learning.Utilizing the deep deterministic policy gradient algorithm,it conducts guidance law training optimized for inputs in typical scenarios.Simulation outcomes indicate that this method surpasses existing ones in the accuracy of hits and attack angles,demands less overload,and effectively resolves the issues of high computational requirements and low efficiency of current reinforcement learning guidance techniques,thereby demonstrating its practical application potential.

关 键 词:深度强化学习 无量纲模型 深度确定性策略梯度算法 终端奖励函数 攻击角度约束 

分 类 号:V448.13[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象