基于深度强化学习的移动机器人路径规划  被引量:30

Path Planning for Mobile Robot Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:董瑶 葛莹莹[1,2] 郭鸿湧 董永峰 杨琛 DONG Yao;GE Yingying;GUO Hongyong;DONG Yongfeng;YANG Chen(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;Hebei Provincial Key Laboratory of Big Data Computing,Hebei University of Technology,Tianjin 300401,China;Hebei University of Engineering,Handan,Hebei 056038,China)

机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401 [2]河北工业大学河北省大数据计算重点实验室,天津300401 [3]河北工程大学,河北邯郸056038

出  处:《计算机工程与应用》2019年第13期15-19,157,共6页Computer Engineering and Applications

基  金:天津市科技计划项目(No.14ZCDGSF00124);天津市自然科学基金(No.16JCYBJC15600)

摘  要:为解决传统的深度Q网络模型下机器人探索复杂未知环境时收敛速度慢的问题,提出了基于竞争网络结构的改进深度双Q网络方法(Improved Dueling Deep Double Q-Network,IDDDQN)。移动机器人通过改进的DDQN网络结构对其三个动作的值函数进行估计,并更新网络参数,通过训练网络得到相应的Q值。移动机器人采用玻尔兹曼分布与ε-greedy相结合的探索策略,选择一个最优动作,到达下一个观察。机器人将通过学习收集到的数据采用改进的重采样优选机制存储到缓存记忆单元中,并利用小批量数据训练网络。实验结果显示,与基本DDQN算法比,IDDDQN训练的机器人能够更快地适应未知环境,网络的收敛速度也得到提高,到达目标点的成功率增加了3倍多,在未知的复杂环境中可以更好地获取最优路径。To solve the problem of slow convergence under the basic deep Q-Network with which the robot explores the complex and unknown environment, an improved deep double Q network algorithm(Improved Dueling Deep Double Q- Network, IDDDQN)based on dueling network structure is put forward. The mobile robot can estimate the state-action value function of its three actions through the improved DDQN network, update the network parameters and get the corresponding Q value through the training. With the combination of Boltzmann and ε-greedy adopted, the mobile robot chooses an optimal action, and reaches the next observation. It can also store the data into experience replay memory through network learning, and train the network with mini-batch data. According to the experiment results, the mobile robot using IDDDQN can quickly adapt to the unknown environment, the convergence speed of IDDDQN is improved, the success rate of reaching the target position adds up to more than three times, and the optimal path can also be gained in an unknown complex environment.

关 键 词:深度双Q网络(DDQN) 竞争网络结构 重采样优选机制 玻尔兹曼分布 ε-greedy策略 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象