检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:田晓航 霍鑫[1] 周典乐 赵辉[1] TIAN Xiao-hang;HUO Xin;ZHOU Dian-le;ZHAO Hui(Control and Simulation Center,Harbin Institute of Technology,Harbin 150080,China;College of Advanced Interdisciplinary Studies,National University of Defense Technology,Changsha 410073,China)
机构地区:[1]哈尔滨工业大学控制与仿真中心,哈尔滨150080 [2]国防科技大学前沿交叉学科学院,长沙410073
出 处:《控制与决策》2023年第12期3345-3353,共9页Control and Decision
基 金:黑龙江省自然科学基金项目(LH2021F025);中央高校基本科研业务费专项资金项目(HIT.NSRIF202242);黑龙江省教改项目(SJGY20200185);哈尔滨工业大学研究生教改核心项目(21HX0401)。
摘 要:当Q学习应用于路径规划问题时,由于动作选择的随机性,以及Q表更新幅度的有限性,智能体会反复探索次优状态和路径,导致算法收敛速度减缓.针对该问题,引入蚁群算法的信息素机制,提出一种寻优范围优化方法,减少智能体的无效探索次数.此外,为提升算法初期迭代的目的性,结合当前栅格与终点位置关系的特点以及智能体动作选择的特性,设计Q表的初始化方法;为使算法在运行的前中后期有合适的探索概率,结合信息素浓度,设计动态调整探索因子的方法.最后,在不同规格不同特点的多种环境中,通过仿真实验验证所提出算法的有效性和可行性.When Q-learning is applied to the path planning problem,due to the randomness of action selection and the limited update range of the Q table,the agent will repeatedly explore sub-optimal states and paths,resulting in slower algorithm convergence.To address this problem,this paper introduces an ant colony pheromone aided Q-learning path planning algorithm,an optimization method for the optimization range is proposed to reduce the invalid exploration times of the agent.In addition,in order to improve the purpose of the initial iteration of the algorithm,according to the characteristics of the relationship between the current grid and the end point and the selection of the agent's action,an initialization method of the Q table is designed.In order to make the algorithm have suitable exploration probability in the early,middle and late stages of operation,a method of dynamically adjusting the exploration factor is designed in combination with the concentration of pheromone.Finally,in a variety of environments with different specifications and different characteristics,the effectiveness and feasibility of the proposed algorithm are verified by simulation experiments.
关 键 词:Q学习 路径规划 Q表初始化 探索概率 蚁群算法 信息素
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.43.181