检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:隋东[1] 董金涛 SUI Dong;DONG Jintao(College of Civil Aviation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出 处:《安全与环境学报》2024年第3期1070-1078,共9页Journal of Safety and Environment
基 金:中国民用航空局资助项目([2022]125号);南京航空航天大学科研与实践创新计划项目(xcxjh20220710)。
摘 要:针对航路上的飞行冲突解脱问题,提出了基于相对熵逆强化学习的飞行冲突解脱方法。首先基于相对熵的逆强化学习算法从历史飞行轨迹数据中学习隐含的管制员先验知识,并以奖励函数的形式进行量化表达。然后,将奖励函数引入基于深度强化学习的冲突解脱模型,以指引训练模型不断向与管制员解脱方案相似的方向更新。试验结果表明,解脱模型能够学习管制先验知识,且在测试集中冲突解脱率超过73%。研究对于减少管制员工作负荷和提升空中交通管制安全性有借鉴价值。The primary objective of air traffic management is to ensure the safety of aircraft flights.Flight conflicts can lead to hazardous approaches or even collisions,resulting in severe consequences.Therefore,studying auxiliary tools to assist controllers in resolving flight conflicts becomes essential.This article aims to enhance the personalization level of regulatory decision-making tools and improve controllers‘acceptance of conflict resolution solutions provided by these tools.Firstly,this article adopts an inverse reinforcement learning method based on relative entropy to extract implicit controller instruction strategies from aircraft flight trajectory data and represent them as reward functions.The flight conflict resolution problem is then modeled using the Markov decision process,and the deep reinforcement learning method(D Q N algorithm)is employed to train the model guided by the aforementioned reward function.The objective is to enhance the success rate of the resolution models and the degree of strategy personalization.Additionally,the article introduces analysis indicators from two perspectives:safety and applicability.Finally,a simulation system based on the Base of Aircraft Data(BADA)database is utilized to generate 5000 flight conflict scenarios.Out of these,4000 scenarios are used for model training,and the remaining 1000 are employed to verify the effectiveness of the proposed method.Experimental results demonstrate that,under the guidance of a reward function incorporating controller strategies,the resolution model consistently improves the success rate of flight conflict scenarios and the similarity to controller strategies.During the testing phase,the successful resolution rate exceeds 70%.This result validates that the inverse reinforcement learning method based on relative entropy effectively learns the empirical knowledge of controllers,thereby enhancing the efficiency and personalization level of the resolution models.These methods present a novel approach to studying and improving the lev
关 键 词:安全工程 空中交通管制 飞行冲突解脱 逆强化学习 深度强化学习
分 类 号:X92[环境科学与工程—安全科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.135.79