检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴健发 王宏伦[1,3] 王延祥 刘一恒 WU Jian-Fa;WANG Hong-Lun;WANG Yan-Xiang;LIU Yi-Heng(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191;Science and Technology on Space Intelligent Control Laboratory,Beijing Institute of Control Engineering,Beijing 100094;Science and Technology on Aircraft Control Laboratory,Beihang University,Beijing 100191)
机构地区:[1]北京航空航天大学自动化科学与电气工程学院,北京100191 [2]北京控制工程研究所空间智能控制技术重点实验室,北京100094 [3]北京航空航天大学飞行器控制一体化技术重点实验室,北京100191
出 处:《自动化学报》2023年第2期272-287,共16页Acta Automatica Sinica
基 金:国家自然科学基金(62173022,61673042,61175084)资助。
摘 要:针对复杂三维障碍环境,提出一种基于深度强化学习的无人机(Unmanned aerial vehicles,UAV)反应式扰动流体路径规划架构.该架构以一种受约束扰动流体动态系统算法作为路径规划的基本方法,根据无人机与各障碍的相对状态以及障碍物类型,通过经深度确定性策略梯度算法训练得到的动作网络在线生成对应障碍的反应系数和方向系数,继而可计算相应的总和扰动矩阵并以此修正无人机的飞行路径,实现反应式避障.此外,还研究了与所提路径规划方法相适配的深度强化学习训练环境规范性建模方法.仿真结果表明,在路径质量大致相同的情况下,该方法在实时性方面明显优于基于预测控制的在线路径规划方法.In this paper,aiming at complex 3D obstacle environments,a reactive interfered fluid path planning framework is proposed for unmanned aerial vehicles(UAV)based on deep reinforcement learning.The constrained interfered fluid dynamical system algorithm is used as the fundamental path planning method in the framework.According to relative states between unmanned aerial vehicles and each obstacle,and categories of obstacles,the reaction and direction coefficients of the corresponding obstacle are generated online using the actor networks trained by deep deterministic policy gradient.On this basis,the total modulation matrices in constrained interfered fluid dynamical system can be resolved and the flight path is accordingly modified to realize the reactive obstacle avoidance.In addition,the normative modeling method of deep reinforcement learning training environments,which is matched with the proposed path planning method,is studied.Finally,simulation results show that the proposed method is obviously superior to the online path planning method based on predictive control in real-time performance under the condition that the path qualities are approximately the same.
关 键 词:无人机 反应式路径规划 受约束扰动流体动态系统 深度强化学习 训练环境
分 类 号:V279[航空宇航科学与技术—飞行器设计] TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.185.110