基于强化学习的多能源动态滑翔航迹优化方法  

Multi energy dynamic soaring trajectory optimization method based on reinforcement learning

在线阅读下载全文

作  者:张云飞[1,2] 王宏伦 张梦华[1,2] 巩轶男 ZHANG Yunfei;WANG Honglun;ZHANG Menghua;GONG Yinan(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;The Science and Technology on Aircraft Control Laboratory,Beihang University,Beijing 100191,China;Hiwing Aviation General Equipment Co.,Ltd.,Beijing 100074,China)

机构地区:[1]北京航空航天大学自动化科学与电气工程学院,北京100191 [2]北京航空航天大学飞行器控制一体化技术国防科技重点实验室,北京100191 [3]海鹰航空通用装备有限责任公司,北京100074

出  处:《西北工业大学学报》2025年第1期128-139,共12页Journal of Northwestern Polytechnical University

摘  要:针对无人机动态滑翔问题,提出了一种基于深度强化学习的航迹优化方法。该方法综合利用梯度风能和太阳能,引入了障碍物约束以模拟复杂障碍环境。使用神经网络近似逼近高斯伪谱方法求解航迹的策略,在训练得到的策略基础上利用双延迟深度确定性策略梯度算法进行策略改进,在大幅度提升推理实时性的同时解决了传统最优控制算法在动态滑翔领域难以应对变化风场的问题。实验针对动态滑翔2种经典模式进行仿真验证,之后在考虑多种能量源的情况下进行蒙特卡洛仿真。结果表明,基于深度强化学习的动态滑翔航迹优化方法在单个滑翔周期内获能与最优结果相近,而实时推理决策时间减少了91%。在变化风场环境下,文中方法相较于传统方法具有更强的适应性。In addressing the issue of dynamic soaring in unmanned aerial vehicles,a trajectory optimization approach based on deep reinforcement learning is proposed.This method synergistically utilizes gradient wind energy and solar energy and incorporates obstacle constraints to simulate complex barrier environments.It employs neural networks to approximate the Gaussian pseudospectral method for solving trajectory policies.On the foundation of the trained policies,the method utilizes the twin delayed deep deterministic policy gradient algorithm for policy enhancement.This significantly boosts the real-time inference capabilities while addressing the challenges traditional optimal control algorithms face in dynamic soaring due to varying wind fields.The experiments initially validate the approach through simulation of two classic modes of dynamic soaring,followed by Monte Carlo simulations considering multiple energy sources.The results indicate that the dynamic soaring trajectory optimization method based on deep reinforcement learning achieves energy acquisition comparable to optimal outcomes within a single soaring cycle,with a 91%reduction in real-time inference decision time.Moreover,in changing wind field environments,this method demonstrates superior adaptability compared to traditional approaches.

关 键 词:动态滑翔 强化学习 高斯伪谱 航迹优化 

分 类 号:V249.1[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象