用于移动机器人路径规划的改进强化学习算法

Improved reinforcement learning algorithm for mobile robot path planning

作　　者：张威[1,3,4] 初泽源杨玉涛王伟 ZHANG Wei;CHU Zeyuan;YANG Yutao;WANG Wei(College of Aeronautical Engineering,CAUC,Tianjin 300300,China;College of Safety Science and Engineering,CAUC,Tianjin 300300,China;Aviation Special Ground Equipment Research Base,CAAC,Tianjin 300300,China;Key Laboratory of Smart Airport Theory and System,CAAC,Guangzhou 510470,China)

机构地区：[1]中国民航大学航空工程学院,天津300300 [2]中国民航大学安全科学与工程学院,天津300300 [3]中国民航航空地面特种设备研究基地,天津300300 [4]民航智慧机场理论与系统重点实验室,广州510470

出　　处：《中国民航大学学报》2024年第5期59-65,共7页Journal of Civil Aviation University of China

基　　金：国家自然科学基金民航联合研究基金重点项目(U2033208);天津市研究生科研创新项目(2021YJSS122)。

摘　　要：针对传统Q-learning算法规划出的路径存在平滑度差、收敛速度慢以及学习效率低的问题,本文提出一种用于移动机器人路径规划的改进Q-learning算法。首先,考虑障碍物密度及起始点相对位置来选择动作集,以加快Q-learning算法的收敛速度;其次,为奖励函数加入一个连续的启发因子,启发因子由当前点与终点的距离和当前点距地图中所有障碍物以及地图边界的距离组成;最后,在Q值表的初始化进程中引入尺度因子,给移动机器人提供先验环境信息,并在栅格地图中对所提出的改进Q-learning算法进行仿真验证。仿真结果表明,改进Q-learning算法相比传统Q-learning算法收敛速度有明显提高,在复杂环境中的适应性更好,验证了改进算法的优越性。Aiming at the problems of poor smoothness,slow convergence speed and low learning efficiency of the paths planned by the traditional Q-learning algorithm,this paper proposes an improved Q-learning algorithm for mobile robot path planning.Firstly,the density of obstacles and the relative position of the start point are considered to select the action set to accelerate the convergence speed of the Q-learning algorithm.Secondly,a continuous heuristic factor is added to the reward function,which consists of the distance between the current point and the end point,and the distance of the current point from all the obstacles in the map as well as the boundary of the map.Finally,a scale factor is introduced into the initialization process of Q-value table to give the mobile robot with a priori environment information,and the proposed improved Q-learning algorithm is simulated and verified in a raster map.The simulation results show that the convergence speed of the improved Q-learning algorithm is significantly improved compared with the traditional Q-learning algorithm,and its adaptability in complex environments is better,which verifies the superiority of the improved algorithm.

关键词：强化学习路径规划启发式奖励函数 Q值初始化

分类号：TP249[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用于移动机器人路径规划的改进强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用于移动机器人路径规划的改进强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索