基于DQN变动力智能决策的轨迹规划  

A Trajectory Planning Method Based on DQN Variable Dynamic Intelligent Decision

在线阅读下载全文

作  者:梅泽伟 李天任 朱佳琳 邵星灵[2,4] 丁天雲 刘俊 MEI Zewei;LI Tianren;ZHU Jialin;SHAO Xingling;DING Tianyun;LIU Jun(School of Instrument and Electronics,North University of China,Taiyuan 030051,Shanxi,China;Key Laboratory of Instrumentation Science&Dynamic Measurement of Ministry of Education,North University of China,Taiyuan 030051,Shanxi,China;Research and Development Center,China Academy of Launch Vehicle Technology,Beijing 100071,China;School of Electrical and Control Engineering,North University of China,Taiyuan 030051,Shanxi,China)

机构地区:[1]中北大学仪器与电子学院,山西太原030051 [2]中北大学仪器科学与动态测试教育部重点实验室,山西太原030051 [3]中国运载火箭技术研究院研究发展中心,北京100071 [4]中北大学电气与控制工程学院,山西太原030051

出  处:《兵工学报》2024年第12期4395-4406,共12页Acta Armamentarii

基  金:国家自然科学基金项目(62173312、61803348)。

摘  要:针对航天飞行器气动力不足难以维持应急侧向操纵确保安全避开障碍物的问题,提出一种基于深度Q学习网络(Deep Q-learning Network,DQN)变动力智能决策的轨迹规划方法。根据变动力航天飞行器运动学方程,设计基于航程误差的纵向制导律和考虑避开障碍物的横侧向制导律,用于实时校正倾侧角的幅值和符号,保证终端制导精度和绕飞安全性。从变动力智能决策层面出发,将航天飞行器动力档位调节问题转化为马尔可夫决策过程,以攻角、马赫数以及航天飞行器与障碍物的相对距离为状态空间,以航天飞行器动力档位为动作空间,设计考虑碰撞概率和终端约束偏差的奖励函数,构建DQN网络对智能体进行训练,以得到最佳动力档位。仿真结果表明,所提算法可以赋能航天飞行器在满足终端约束条件下提升运动过程的横向避障能力。The aerospace craft faces difficulty in maintaining emergency lateral maneuver to avoid obstacle due to aerodynamic deficiency.Therefore,a trajectory planning method based on DQN variable dynamic intelligent decision is proposed.According to the kinematics equations for variable dynamic aerospace craft,the longitudinal guidance law based on range error and the lateral guidance law based on line-of-sight angle deviation are designed to respectively correct the heeling angle amplitude and symbol in real time,which ensures the terminal guidance accuracy and safety.In consideration of variable dynamic intelligent decision,the dynamic gear switching problem of aerospace craft is transformed into a Markov decision process.Then,the angle of attack,Mach,and relative distance from the aerospace craft to obstacle are taken as the state space,and the power gear position of aerospace craft is used as the action space.A reward function,considering the lowest collision probability and the smallest terminal error,is designed.and a DQN network is constructed to train the agent to obtain the best power gear.The simulated results show that the proposed algorithm can enable the aerospace craft to improve the lateral maneuverability during moving under the terminal constraints.

关 键 词:航天飞行器 深度Q学习网络 变动力 智能决策 轨迹规划 

分 类 号:V448.235[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象