检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:薛均晓[1] 孔祥燕 郭毅博 鲁爱国 李鉴 万曦 徐明亮[2] Xue Junxiao;Kong Xiangyan;Guo Yibo;Lu Aiguo;Li Jian;Wan Xi;Xu Mingliang(School of Software Engineering,Zhengzhou University,Zhengzhou 450002;School of Information Engineering,Zhengzhou University,Zhengzhou 450001;No.709 Research Institute of China Shipbuilding Industry Corporation,Wuhan 430070)
机构地区:[1]郑州大学软件学院,郑州450002 [2]郑州大学信息工程学院,郑州450001 [3]中国船舶重工集团第709研究所,武汉430070
出 处:《计算机辅助设计与图形学学报》2021年第7期1102-1112,共11页Journal of Computer-Aided Design & Computer Graphics
基 金:国家自然科学基金(62036010,61822701);河南省高校科技创新人才支持计划(18HASTIT020)。
摘 要:针对高度异构、动态的航母甲板作业场景中的舰载机避障问题,提出一种结合预测算法和深度强化学习的避障方法.该方法包含场景建模、奖励模型和轨迹预测模型等模块.首先基于智能体状态和动作空间对航母甲板场景进行建模;然后利用最小二乘法对场景中动态障碍物的位置进行实时轨迹预测,并构造了包含路径预测模块的深度强化学习方法——环境预测深度Q网络(PDQN);最后利用该方法实现航母甲板作业场景中的舰载机动态避障.利用Python绘图集Matplotlib进行仿真实验,实验数据结果表明,相比于Q-learning,SARSA等方法,所提方法的准确率提升了15%~25%,路径长度短9%~39%,平均奖励值高30%~100%,收敛速度快1~2倍且训练平稳后准确率的标准差小2%~50%.Aiming at the obstacle avoidance problem of carrier aircraft in the highly heterogeneous and dynamic aircraft carrier deck operation scene,a deep reinforcement learning obstacle avoidance method combined with a prediction algorithm is proposed.The method includes scene modeling,reward model and trajectory prediction model.First,the aircraft carrier deck scene is modeled based on the agent state and action space.Then the least square method is used to predict the position of dynamic obstacles in the scene in real-time and a deep reinforcement learning algorithm—environmental prediction deep Q network(PDQN)is constructed which includes a path prediction module.Finally,the algorithm is used to achieve dynamic obstacle avoidance in the aircraft carrier deck operation scene.The Python drawing set Matplotlib is used for simulation experiments.The experimental results show that,compared with Q-learning,SARSA,the accuracy of the proposed method is improved by 15%–25%,the path length is shorter by 9%–39%,the average reward value is higher by 30%–100%,the convergence speed is 1–2 times faster,and the standard deviation of the accuracy after training is small by 2%–50%.
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.30.59