基于双估计器的改进Speedy Q-learning算法  被引量:6

Improved Speedy Q-learning Algorithm Based on Double Estimator

在线阅读下载全文

作  者:郑帅 罗飞[1] 顾春华[1] 丁炜超 卢海峰 ZHENG Shuai;LUO Fei;GU Chun-hua;DING Wei-chao;LU Hai-feng(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)

机构地区:[1]华东理工大学信息科学与工程学院,上海200237

出  处:《计算机科学》2020年第7期179-185,共7页Computer Science

基  金:国家自然科学基金(61472139);华东理工大学2017年教育教学规律与方法研究项目(ZH1726107)。

摘  要:Q-learning算法是一种经典的强化学习算法,更新策略由于保守和过估计的原因,存在收敛速度慢的问题。Speedy Q-learning算法和Double Q-learning算法是Q-learning算法的两个变种,分别用于解决Q-learning算法收敛速度慢和过估计的问题。文中基于Speedy Q-learning算法Q值的更新规则和蒙特卡洛强化学习的更新策略,通过理论分析及数学证明提出了其等价形式,从该等价形式可以看到,Speedy Q-learning算法由于将当前Q值的估计函数作为历史Q值的估计,虽然整体上提升了智能体的收敛速度,但是同样存在过估计问题,使得算法在迭代初期的收敛速度较慢。针对该问题,文中基于Double Q-learning算法中双估计器可以改善智能体收敛速度的特性,提出了一种改进算法Double Speedy Q-learning。其通过双估计器,分离最优动作和最大Q值的选择,改善了Speedy Q-learning算法在迭代初期的学习策略,提升了Speedy Q-learning算法的整体收敛速度。在不同规模的格子世界中进行实验,分别采用线性学习率和多项式学习率,来对比Q-learning算法及其改进算法在迭代初期的收敛速度和整体收敛速度。实验结果表明,Double Speedy Q-learning算法在迭代初期的收敛速度快于Speedy Q-learning算法,且其整体收敛速度明显快于对比算法,其实际平均奖励值和期望奖励值之间的差值最小。Q-learning algorithm is a classical reinforcement learning algorithm.However,due to overestimation and the conservative updating strategy,there exists a problem of slow convergence.Speedy Q-learning algorithm and Double Q-learning algorithm are two variants of the Q-learning algorithm which are used to solve the problems of slow convergence and over-estimation in Q-learning algorithm respectively.Based on the updating rule of Q value in Speedy Q-learning algorithm and the updating strategy of Monte Carlo reinforcement learning,the equivalent form of the updating rule of Q value is proposed through theoretical analysis and mathematical proof.According to the equivalent form,Speedy Q-learning algorithm takes the estimation function of current Q value as the estimation of the historical Q value.Although the overall convergence speed of the agent is improved,Speedy Q-learning also has the problem of overestimation,which leads to a slow convergence at the beginning of iterations.In order to solve the problem of slow convergence at the initial stage of iterations in the Speedy Q-learning algorithm,an improved algorithm named Double Speedy Q-learning is proposed based on the fact that the double estimator in the Double Q-learning algorithm can improve the convergence speed of agents.By using double estimator,the selection of optimal action and maximum Q value is separated,so that the learning strategy of Speedy Q-learning algorithm in the initial iteration period can be improved and the overall convergence speed of Speedy Q-learning algorithm can be improved.Through grid world experiments of different scales,linear learning rate and polynomial learning rate are used to compare the convergence speed of Q-learning algorithm and its improved algorithm in the initial iteration stage and the overall convergence speed.The results show that the convergence speed of the Double Speedy Q-learning algorithm is faster than that of Speedy Q-learning algorithm in the initial iteration stage and its overall convergence speed is signif

关 键 词:Q-LEARNING Double Q-learning Speedy Q-learning 强化学习 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象