检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄昱舟 胡庆玉 熊华乔 HUANG Yuzhou;HU Qingyu;XIONG Huaqiao(The 710 Research Institute of CSSC,Yichang 443000,China)
机构地区:[1]中国船舶集团有限公司第七一〇研究所,湖北宜昌443000
出 处:《舰船科学技术》2024年第24期92-96,共5页Ship Science and Technology
摘 要:针对欠驱动AUV全局路径规划问题,提出一种轻量级改进Q学习算法。设计距离奖励函数加快学习速率,提高算法稳定性,结合ε贪婪策略和Softmax策略提供一种平衡探索与利用的机制,根据AUV运动约束简化动作集合提高计算时间。仿真结果表明,改进的算法能够高效解决AUV路径规划问题,提升算法稳定性与适用范围。相比较传统Q学习算法,执行短距离任务时,算法学习效率提高90%,路径长度缩短7.85%,转向次数减少14.29%,执行长距离任务时,学习效率提高67.5%,路径长度缩短6.10%,转向次数减少32.14%。A lightweight improved Q-learning algorithm is proposed for the underactuated AUV global path planning problem.The distance reward function is designed to accelerate the learning rate and improve algorithm stability.The combination of epsilon-greedy strategy and Softmax strategy provides a mechanism to balance exploration and exploitation.The algorithm simplifies the action set based on AUV motion constraints to improve computational time.Simulation results demonstrate that the proposed algorithm efficiently solves the AUV path planning problem,enhancing algorithm stability and applicability.Compared to traditional Q-learning algorithms,when performing short-distance tasks,the learning efficiency is increased by 90%,the path length is reduced by 7.85%,and the number of turns is reduced by 14.29%.When performing long-distance tasks,the learning efficiency is improved by 67.5%,the path length is reduced by 6.10%,and the number of turns is reduced by 32.14%.
关 键 词:自主水下航行器 路径规划 Q学习 Softmax策略 距离奖惩机制
分 类 号:U674.91[交通运输工程—船舶及航道工程] TP242[交通运输工程—船舶与海洋工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.219.93.1