基于强化学习的改进三维A^(*)算法在线航迹规划  被引量:5

Improved three-dimensional A^(*)algorithm of real-time path planning based on reinforcement learning

在线阅读下载全文

作  者:任智 张栋[1,2] 唐硕 REN Zhi;ZHANG Dong;TANG Shuo(School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,China;Shaanxi Key Laboratory of Space Vehicle Design,Xi’an 710072,China)

机构地区:[1]西北工业大学航天学院,陕西西安710072 [2]陕西省空天飞行器设计重点实验室,陕西西安710072

出  处:《系统工程与电子技术》2023年第1期193-201,共9页Systems Engineering and Electronics

基  金:国家自然科学基金重点项目(61933010);国家自然科学基金(61903301)资助课题。

摘  要:针对飞行器在线航迹规划对算法实时性与结果最优性要求高的问题,基于强化学习方法改进三维A^(*)算法。首先,引入收缩因子改进代价函数的启发信息加权方法提升算法时间性能;其次,建立算法实时性与结果最优性的性能变化度量模型,结合深度确定性策略梯度方法设计动作-状态与奖励函数,对收缩因子进行优化训练;最后,在多场景下对改进后的三维A^(*)算法进行仿真验证。仿真结果表明,改进算法能够在保证航迹结果最优性的同时有效提升算法时间性能。In order to address the problem of high requirements for real-time performance and optimality of real-time path planning,a three-dimensional A^(*)algorithm is improved based on the reinforcement learning method.Firstly,the shrinkage factor is introduced to ameliorate the heuristic information weighting method of the improved cost function,so as to improve the time performance.Secondly,a measurement model is established to measure the real-time performance and optimality of the algorithm.Combined with the deterministic policy gradient method,the action-state and reward functions are designed to optimize the shrinkage factor.Finally,the improved three-dimensional A^(*)algorithm is simulated in multiple scenarios,and the simulation results show that the improved algorithm can ensure the optimality of the track results and effectively improve the time performance of the algorithm.

关 键 词:改进A^(*)算法 收缩因子 强化学习 深度确定性策略梯度 在线航迹规划 

分 类 号:TJ765[兵器科学与技术—武器系统与运用工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象