检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭茂祖[1] 王亚东[1] 刘 扬[1] 孙华梅[2]
机构地区:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨工业大学管理学院,哈尔滨150001
出 处:《计算机研究与发展》2002年第6期684-688,共5页Journal of Computer Research and Development
基 金:本课题得到国家"八六三"高技术研究发展计划(200lAA115550);国家自然科学基金(70071008);中国博士后科学基金资助
摘 要:探索与扩张是Q-学习算法中动作选取的关键问题,一味地扩张将使智能体很快地陷入局部最优,虽然探索可以跳出局部最优并加速学习,而过多的探索将影响算法的性能.通过把Q-学习中寻求最优策略表示为组合优化问题中最优解的搜索,将模拟退火算法的Metropolis准则用于Q-学习中探索和扩张之间的折衷处理,提出基于Metropolis准则的Q-学习算法SA-Q-learning.通过实验比较,它具有更快的收敛速度,而且避免了过多探索引起的算法性能下降.The balance between exploration and exploitation is one of the key problems when action selection is performed in Q-learning. Pure exploitations will cause the agent to reach the local optimization quickly, whereas excessive explorations will degenerate the performance of the Q-learning algorithm even if they can accelerate learning process and can avoid the local optimization. In this paper, finding the optimum policy in Q-learning is described as searching optimum solution in combinatorial optimization. Then Metropolis criterion of simulated annealing algorithm is introduced in the balance between exploration and exploitation of Q-learning, and the Q-learning algorithm based on Metropolis criterion, SA-Q-learning, is correspondingly presented. Finally, tests show that SA-Q-learning converges more quickly than Q-learning, and can avoid the degeneracy in performance due to excessive explorations.
关 键 词:机器学习 METROPOLIS准则 Q-学习算法
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28