基于深度强化学习的舰载机动态避障方法被引量：13

Dynamic Obstacle Avoidance Method for Carrier Aircraft Based on Deep Reinforcement Learning

作　　者：薛均晓[1] 孔祥燕郭毅博鲁爱国李鉴万曦徐明亮[2] Xue Junxiao;Kong Xiangyan;Guo Yibo;Lu Aiguo;Li Jian;Wan Xi;Xu Mingliang(School of Software Engineering,Zhengzhou University,Zhengzhou 450002;School of Information Engineering,Zhengzhou University,Zhengzhou 450001;No.709 Research Institute of China Shipbuilding Industry Corporation,Wuhan 430070)

机构地区：[1]郑州大学软件学院,郑州450002 [2]郑州大学信息工程学院,郑州450001 [3]中国船舶重工集团第709研究所,武汉430070

出　　处：《计算机辅助设计与图形学学报》2021年第7期1102-1112,共11页Journal of Computer-Aided Design & Computer Graphics

基　　金：国家自然科学基金(62036010,61822701);河南省高校科技创新人才支持计划(18HASTIT020)。

摘　　要：针对高度异构、动态的航母甲板作业场景中的舰载机避障问题,提出一种结合预测算法和深度强化学习的避障方法.该方法包含场景建模、奖励模型和轨迹预测模型等模块.首先基于智能体状态和动作空间对航母甲板场景进行建模;然后利用最小二乘法对场景中动态障碍物的位置进行实时轨迹预测,并构造了包含路径预测模块的深度强化学习方法——环境预测深度Q网络(PDQN);最后利用该方法实现航母甲板作业场景中的舰载机动态避障.利用Python绘图集Matplotlib进行仿真实验,实验数据结果表明,相比于Q-learning,SARSA等方法,所提方法的准确率提升了15%~25%,路径长度短9%~39%,平均奖励值高30%~100%,收敛速度快1~2倍且训练平稳后准确率的标准差小2%~50%.Aiming at the obstacle avoidance problem of carrier aircraft in the highly heterogeneous and dynamic aircraft carrier deck operation scene,a deep reinforcement learning obstacle avoidance method combined with a prediction algorithm is proposed.The method includes scene modeling,reward model and trajectory prediction model.First,the aircraft carrier deck scene is modeled based on the agent state and action space.Then the least square method is used to predict the position of dynamic obstacles in the scene in real-time and a deep reinforcement learning algorithm—environmental prediction deep Q network(PDQN)is constructed which includes a path prediction module.Finally,the algorithm is used to achieve dynamic obstacle avoidance in the aircraft carrier deck operation scene.The Python drawing set Matplotlib is used for simulation experiments.The experimental results show that,compared with Q-learning,SARSA,the accuracy of the proposed method is improved by 15%–25%,the path length is shorter by 9%–39%,the average reward value is higher by 30%–100%,the convergence speed is 1–2 times faster,and the standard deviation of the accuracy after training is small by 2%–50%.

关键词：航空母舰强化学习轨迹预测动态避障

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的舰载机动态避障方法被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的舰载机动态避障方法 被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度强化学习的舰载机动态避障方法被引量：13