基于相对熵逆强化学习的飞行冲突解脱方法

Flight conflict resolution method based on relative entropy inverse reinforcement learning

作　　者：隋东[1] 董金涛 SUI Dong;DONG Jintao(College of Civil Aviation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区：[1]南京航空航天大学民航学院,南京211106

出　　处：《安全与环境学报》2024年第3期1070-1078,共9页Journal of Safety and Environment

基　　金：中国民用航空局资助项目([2022]125号);南京航空航天大学科研与实践创新计划项目(xcxjh20220710)。

摘　　要：针对航路上的飞行冲突解脱问题,提出了基于相对熵逆强化学习的飞行冲突解脱方法。首先基于相对熵的逆强化学习算法从历史飞行轨迹数据中学习隐含的管制员先验知识,并以奖励函数的形式进行量化表达。然后,将奖励函数引入基于深度强化学习的冲突解脱模型,以指引训练模型不断向与管制员解脱方案相似的方向更新。试验结果表明,解脱模型能够学习管制先验知识,且在测试集中冲突解脱率超过73%。研究对于减少管制员工作负荷和提升空中交通管制安全性有借鉴价值。The primary objective of air traffic management is to ensure the safety of aircraft flights.Flight conflicts can lead to hazardous approaches or even collisions,resulting in severe consequences.Therefore,studying auxiliary tools to assist controllers in resolving flight conflicts becomes essential.This article aims to enhance the personalization level of regulatory decision-making tools and improve controllers‘acceptance of conflict resolution solutions provided by these tools.Firstly,this article adopts an inverse reinforcement learning method based on relative entropy to extract implicit controller instruction strategies from aircraft flight trajectory data and represent them as reward functions.The flight conflict resolution problem is then modeled using the Markov decision process,and the deep reinforcement learning method(D Q N algorithm)is employed to train the model guided by the aforementioned reward function.The objective is to enhance the success rate of the resolution models and the degree of strategy personalization.Additionally,the article introduces analysis indicators from two perspectives:safety and applicability.Finally,a simulation system based on the Base of Aircraft Data(BADA)database is utilized to generate 5000 flight conflict scenarios.Out of these,4000 scenarios are used for model training,and the remaining 1000 are employed to verify the effectiveness of the proposed method.Experimental results demonstrate that,under the guidance of a reward function incorporating controller strategies,the resolution model consistently improves the success rate of flight conflict scenarios and the similarity to controller strategies.During the testing phase,the successful resolution rate exceeds 70%.This result validates that the inverse reinforcement learning method based on relative entropy effectively learns the empirical knowledge of controllers,thereby enhancing the efficiency and personalization level of the resolution models.These methods present a novel approach to studying and improving the lev

关键词：安全工程空中交通管制飞行冲突解脱逆强化学习深度强化学习

分类号：X92[环境科学与工程—安全科学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于相对熵逆强化学习的飞行冲突解脱方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于相对熵逆强化学习的飞行冲突解脱方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索