检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蒋庆吉 王小刚[1] 白瑜亮[1] 李瑜 JIANG Qingji;WANG Xiaogang;BAI Yuliang;LI Yu(School of Astronautics,Harbin Institute of Technology,Harbin 150001,China;Beijing Aerospace Technology Institute,Beijing 100074,China)
机构地区:[1]哈尔滨工业大学航天学院,哈尔滨150001 [2]北京空天技术研究所,北京100074
出 处:《宇航学报》2023年第6期851-862,共12页Journal of Astronautics
基 金:国家自然科学基金项目(U20B2005)。
摘 要:针对再入滑翔飞行器在俯冲段易受拦截器威胁这一问题,提出了一种智能博弈机动策略,可有效规避拦截威胁,并保证命中精度。首先,构建包括再入滑翔飞行器、典型拦截器等动力学、运动学和制导律模型的强化学习训练环境。其次,将飞行器运动参数作为状态变量,将攻角和倾侧角的变化率设计为动作变量,以拦截器脱靶量和落点偏差设计奖励函数,将博弈机动问题转化为马尔科夫决策过程。为解决深度强化学习中的稀疏奖励问题,提高了较高优先级样本和成功样本的采样概率以加速策略提升,同时设计了一种基于任务成功率的自适应动作噪声,以改进深度确定性策略梯度算法的探索机制,提高了智能博弈机动策略的训练效率。最后,针对典型场景进行了数学仿真,结果表明提出的智能博弈机动策略在泛化场景下成功率仍大于90%,从而验证了方法的有效性和神经网络的泛化能力。Aiming at the problem that the reentry glide vehicle in the dive phase is threatened by interceptors,an intelligent game-maneuvering strategy is proposed,which can effectively avoid the interceptor and ensure the impact accuracy.Firstly,a reinforcement learning-required training environment including dynamic,kinematic,and guidance law models of reentry glide vehicles and typical interceptors is constructed.Secondly,taking motion parameters as the state variables,the changing rates of attack of angle and bank angle are designed as the action variables,and the reward function is designed based on the miss distance of the interceptor and the impact deviation.Thus,the game-maneuvering problem is transformed into a Markov decision process.In order to solve the problem of sparse rewards in deep reinforcement learning,the sampling probabilities of higher priority samples and successful samples are improved to accelerate strategy promotion.Meanwhile,an adaptive action noise based on task success rate is designed to improve the exploration mechanism of the deep deterministic policy gradient algorithm and improve the training efficiency.Finally,mathematical simulations are performed for typical scenarios,which imply that the task-success rate in generation scenarios of the proposed intelligent game-maneuvering policy is greater than 90%,thereby verifying the validity of the method and the generation ability of neural networks.
关 键 词:再入滑翔飞行器 俯冲段 强化学习 博弈机动 深度确定性策略梯度
分 类 号:TJ765.3[兵器科学与技术—武器系统与运用工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.200.151