检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Zijian HU Xiaoguang GAO Kaifang WAN Neretin EVGENY Jinliang LI
机构地区:[1]School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China [2]School of Robotic and Intelligent Systems,Moscow Aviation Institute(National Research University),Moscow 125993,Russia [3]Electromagnetic Space Operations and Applications Laboratory,The 29th Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China
出 处:《Chinese Journal of Aeronautics》2023年第5期377-391,共15页中国航空学报(英文版)
基 金:co-supported by the National Natural Science Foundation of China(Nos.62003267 and 61573285);the Natural Science Basic Research Plan in Shaanxi Province of China(No.2020JQ-220);the Open Project of Science and Technology on Electronic Information Control Laboratory,China(No.JS20201100339);the Open Project of Science and Technology on Electromagnetic Space Operations and Applications Laboratory,China(No.JS20210586512).
摘 要:As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.
关 键 词:Artificial intelligence Autonomous navigation control Deep reinforcement learning Hindsight experience replay UAV
分 类 号:V279[航空宇航科学与技术—飞行器设计] V249
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.48