检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张凤[1] 顾琦然 袁帅[1] ZHANG Feng;GU Qiran;YUAN Shuai(School of Electrical and Control Engineering,Shenyang Jianzhu University,Shenyang 110168,China)
机构地区:[1]沈阳建筑大学电气与控制工程学院,沈阳110168
出 处:《计算机工程与应用》2023年第19期316-322,共7页Computer Engineering and Applications
基 金:国家自然科学基金面上项目(62073227);辽宁省教育厅基金(LJKZ0581)。
摘 要:针对移动机器人的路径规划中DQN算法存在过估计、样本利用率低、奖励稀疏等,从而影响机器人获取最优路径的问题,提出基于好奇心蒸馏模块竞争架构的双Q网络(curiosity distillation module dueling deep double Q-network prioritized experience replay,CDM-D3QN-PER)方法。该方法以D3QN为基础,在输入端添加长短时记忆网络(long short term memory,LSTM)处理雷达和相机的信息,降低过估计的影响,获得更有利的环境信息;采用优先经验回放机制(prioritized experience replay,PER)作为采样方法,使样本得到充分利用,提高样本利用率;引入好奇心蒸馏模块(curiosity distillation module,CDM),缓解奖励稀疏的问题。通过仿真实验与DQN、DDQN、D3QN相比,CDM-D3QN-PER算法训练的机器人到达目标点的次数明显增加,为DQN算法的3倍。该算法使奖励值得到提升,加快了收敛速度,能够在复杂的未知环境中获得最优路径。Aiming at the problem of overestimation,low sample utilization and sparse reward of DQN algorithm in mobile robot path planning,an end-to-end path planning method based on improved deep reinforcement learning is proposed,namely the curiosity distillation module dueling deep double Q-network prioritized experience replay(CDM-D3QN-PER).This method is based on D3QN to reduce the adverse effects of overestimation.Long short term memory(LSTM)is added to the input to process the information of radar and camera to obtain more favorable environmental information.It uses prioritized experience replay(PER)as sampling method to make full use of samples and improve sample utilization,and the curiosity distillation module(CDM)is introduced to alleviate the problem of reward sparsity to some extent.The experimental results show that compared with DQN,DDQN and D3QN,the number of robots reaching the target point trained by CDM-D3QN-PER algorithm is significantly increased,and it is three times that of DQN algorithm.The algorithm makes reward worthy of promotion,network convergence speed is improved,in unknown complex environ-ment can better obtain the optimal path.
关 键 词:DQN算法 D3QN算法 好奇心蒸馏模块 长短时记忆网络(LSTM) 最优路径
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.83.123