基于Metropolis准则的Q-学习算法研究被引量：14

RESEARCH ON Q-LEARNING ALGORITHM BASED ON METROPOLIS CRITERION

机构地区：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨工业大学管理学院,哈尔滨150001

出　　处：《计算机研究与发展》2002年第6期684-688,共5页Journal of Computer Research and Development

基　　金：本课题得到国家"八六三"高技术研究发展计划(200lAA115550);国家自然科学基金(70071008);中国博士后科学基金资助

摘　　要：探索与扩张是Q-学习算法中动作选取的关键问题,一味地扩张将使智能体很快地陷入局部最优,虽然探索可以跳出局部最优并加速学习,而过多的探索将影响算法的性能.通过把Q-学习中寻求最优策略表示为组合优化问题中最优解的搜索,将模拟退火算法的Metropolis准则用于Q-学习中探索和扩张之间的折衷处理,提出基于Metropolis准则的Q-学习算法SA-Q-learning.通过实验比较,它具有更快的收敛速度,而且避免了过多探索引起的算法性能下降.The balance between exploration and exploitation is one of the key problems when action selection is performed in Q-learning. Pure exploitations will cause the agent to reach the local optimization quickly, whereas excessive explorations will degenerate the performance of the Q-learning algorithm even if they can accelerate learning process and can avoid the local optimization. In this paper, finding the optimum policy in Q-learning is described as searching optimum solution in combinatorial optimization. Then Metropolis criterion of simulated annealing algorithm is introduced in the balance between exploration and exploitation of Q-learning, and the Q-learning algorithm based on Metropolis criterion, SA-Q-learning, is correspondingly presented. Finally, tests show that SA-Q-learning converges more quickly than Q-learning, and can avoid the degeneracy in performance due to excessive explorations.

关键词：机器学习 METROPOLIS准则 Q-学习算法

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Metropolis准则的Q-学习算法研究被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Metropolis准则的Q-学习算法研究 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Metropolis准则的Q-学习算法研究被引量：14