好奇心蒸馏双Q网络移动机器人路径规划方法  被引量:1

Path Planning Method for Mobile Robot Based on Curiosity Distillation Double Q-Network

在线阅读下载全文

作  者:张凤[1] 顾琦然 袁帅[1] ZHANG Feng;GU Qiran;YUAN Shuai(School of Electrical and Control Engineering,Shenyang Jianzhu University,Shenyang 110168,China)

机构地区:[1]沈阳建筑大学电气与控制工程学院,沈阳110168

出  处:《计算机工程与应用》2023年第19期316-322,共7页Computer Engineering and Applications

基  金:国家自然科学基金面上项目(62073227);辽宁省教育厅基金(LJKZ0581)。

摘  要:针对移动机器人的路径规划中DQN算法存在过估计、样本利用率低、奖励稀疏等,从而影响机器人获取最优路径的问题,提出基于好奇心蒸馏模块竞争架构的双Q网络(curiosity distillation module dueling deep double Q-network prioritized experience replay,CDM-D3QN-PER)方法。该方法以D3QN为基础,在输入端添加长短时记忆网络(long short term memory,LSTM)处理雷达和相机的信息,降低过估计的影响,获得更有利的环境信息;采用优先经验回放机制(prioritized experience replay,PER)作为采样方法,使样本得到充分利用,提高样本利用率;引入好奇心蒸馏模块(curiosity distillation module,CDM),缓解奖励稀疏的问题。通过仿真实验与DQN、DDQN、D3QN相比,CDM-D3QN-PER算法训练的机器人到达目标点的次数明显增加,为DQN算法的3倍。该算法使奖励值得到提升,加快了收敛速度,能够在复杂的未知环境中获得最优路径。Aiming at the problem of overestimation,low sample utilization and sparse reward of DQN algorithm in mobile robot path planning,an end-to-end path planning method based on improved deep reinforcement learning is proposed,namely the curiosity distillation module dueling deep double Q-network prioritized experience replay(CDM-D3QN-PER).This method is based on D3QN to reduce the adverse effects of overestimation.Long short term memory(LSTM)is added to the input to process the information of radar and camera to obtain more favorable environmental information.It uses prioritized experience replay(PER)as sampling method to make full use of samples and improve sample utilization,and the curiosity distillation module(CDM)is introduced to alleviate the problem of reward sparsity to some extent.The experimental results show that compared with DQN,DDQN and D3QN,the number of robots reaching the target point trained by CDM-D3QN-PER algorithm is significantly increased,and it is three times that of DQN algorithm.The algorithm makes reward worthy of promotion,network convergence speed is improved,in unknown complex environ-ment can better obtain the optimal path.

关 键 词:DQN算法 D3QN算法 好奇心蒸馏模块 长短时记忆网络(LSTM) 最优路径 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象