基于强化学习的离场飞行程序航迹生成方法

Reinforcement learning-based trajectory generation method for departure flight procedure

作　　者：宋歌韩鹏飞罗钰翔 SONG Ge;HAN Pengfei;LUO Yuxiang(Air Traffic Management College,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China)

机构地区：[1]中国民用航空飞行学院空中交通管理学院,四川广汉618307

出　　处：《计算机应用》2024年第S01期355-362,共8页journal of Computer Applications

基　　金：民航局安全能力建设项目(MHAQ2022008,MHAQ2022004)。

摘　　要：现代飞行程序设计受地形、障碍物、空域和飞行性能等多种因素的影响,设计过程中需进行大量针对设计细节有效性的评估工作;设计完毕的飞行程序还需专业的飞行试飞人员进行模拟机和真机试飞,耗费大量的人力、经济成本。如果试飞前缺少针对性的分析评估,一方面会增加试飞成本的支出,另一方面也会导致真机试飞环节存在安全隐患。针对上述问题,利用深度强化学习技术,提出一种在满足飞行程序设计规范条件下,面向飞行程序有效性和可行性验证的离场航迹自动生成方法。首先,利用空气动力学原理,建立考虑飞行性能和障碍物超障因素的基本飞行动力学模型,并借助Unity3D引擎构建三维可视化的训练平台;其次,在PyTorch深度学习框架中,利用Mlagents强化学习平台构建航空器在飞行时各个阶段的试飞训练模型,设计包括起飞、转弯、巡航和降落这4个目标的场景和奖励函数。以离场飞行程序试飞为例,采用厦门高崎机场某PBN(Performance Based Navigation)离场程序进行实例训练验证,并利用动态时间规整(DTW)距离量化实际生成航迹与标称航迹之间的偏离度。实验结果显示,偏差度满足飞行程序超障保护区的限制要求。上述训练模型在其他离场程序的实验结果也验证了模型具有较好的泛化能力。The design of modern flight procedures is affected by a variety of factors such as terrain,obstacles,airspace,and flight performance,so that a lot of evaluation work needs to be carried out in the design process.After the design of the flight procedure,it is also necessary for professional flight test pilots to carry out test flight verification of simulator and real aircraft to verify the feasibility of the procedure.The test flight work requires a lot of human and economic costs.Currently,there is a lack of other more effective methods to verify the safety and feasibility of the procedure.Aiming at the above problems,a method to verify the feasibility of flight procedures under the condition of satisfying the design specification of flight procedures by using deep reinforcement learning technology was proposed.Firstly,a basic flight dynamics model was established by utilizing the principle of aerodynamics and considering the flight performance and obstacle factors.Secondly,with the help of Unity3D engine,in PyTorch deep learning framework,a flight test training model,including the scenes and reward functions of four stages,taking off,turning,cruising and landing,was constructed with Mlagents reinforcement learning platform.A PBN(Performance Based Navigation)departure procedure at Xiamen Gaoqi Airport was verified,and the DTW(Dynamic Time Warping)distance was used to quantify the degree of deviation between the actual trajectory and the nominal trajectory.The experimental results show that the degree of deviation between the actual and nominal trajectories meets the limits of obstacle protection zone.The generalization ability of the model was verified by testing on other departure procedures.

关键词：现代飞行程序设计深度强化学习航迹生成分层强化学习多维度动态时间规整

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的离场飞行程序航迹生成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的离场飞行程序航迹生成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索