基于Metropolis准则的Q-学习算法研究  被引量:14

RESEARCH ON Q-LEARNING ALGORITHM BASED ON METROPOLIS CRITERION

在线阅读下载全文

作  者:郭茂祖[1] 王亚东[1] 刘 扬[1] 孙华梅[2] 

机构地区:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨工业大学管理学院,哈尔滨150001

出  处:《计算机研究与发展》2002年第6期684-688,共5页Journal of Computer Research and Development

基  金:本课题得到国家"八六三"高技术研究发展计划(200lAA115550);国家自然科学基金(70071008);中国博士后科学基金资助

摘  要:探索与扩张是Q-学习算法中动作选取的关键问题,一味地扩张将使智能体很快地陷入局部最优,虽然探索可以跳出局部最优并加速学习,而过多的探索将影响算法的性能.通过把Q-学习中寻求最优策略表示为组合优化问题中最优解的搜索,将模拟退火算法的Metropolis准则用于Q-学习中探索和扩张之间的折衷处理,提出基于Metropolis准则的Q-学习算法SA-Q-learning.通过实验比较,它具有更快的收敛速度,而且避免了过多探索引起的算法性能下降.The balance between exploration and exploitation is one of the key problems when action selection is performed in Q-learning. Pure exploitations will cause the agent to reach the local optimization quickly, whereas excessive explorations will degenerate the performance of the Q-learning algorithm even if they can accelerate learning process and can avoid the local optimization. In this paper, finding the optimum policy in Q-learning is described as searching optimum solution in combinatorial optimization. Then Metropolis criterion of simulated annealing algorithm is introduced in the balance between exploration and exploitation of Q-learning, and the Q-learning algorithm based on Metropolis criterion, SA-Q-learning, is correspondingly presented. Finally, tests show that SA-Q-learning converges more quickly than Q-learning, and can avoid the degeneracy in performance due to excessive explorations.

关 键 词:机器学习 METROPOLIS准则 Q-学习算法 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象