基于强化学习的值迭代算法

Value Iteration Algorithm Based on Reinforcement Learning

作　　者：崔军晓朱蒙婷王海燕[1] 章鹏[1] 王辉[1] CUI Jun-xiao, ZHU Meng-ting, WANG Hai-yan, ZHANG Peng, WANG Hui （Soochow University College of Computer Science and Technology, Suzhou 215006, China）

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006

出　　处：《电脑知识与技术》2014年第11期7348-7350,共3页Computer Knowledge and Technology

摘　　要：强化学习（Reinforcement Learning）是学习环境状态到动作的一种映射,并且能够获得最大的奖赏信号。强化学习中有三种方法可以实现回报的最大化：值迭代、策略迭代、策略搜索。该文介绍了强化学习的原理、算法,并对有环境模型和无环境模型的离散空间值迭代算法进行研究,并且把该算法用于固定起点和随机起点的格子世界问题。实验结果表明,相比策略迭代算法,该算法收敛速度快,实验精度好。Reinforcement learning is learning how to map situations to actions and get the maximize reward signal. In reinforcement learning, there are three methods that can maximize the cumulative reward. They are value iteration, policy iteration and policy search. In this paper, we survey the foundation and algorithms of reinforcement learning , research about model-based value iteration and model-free value iteration and use this algorithms to solve the fixed starting point and random fixed starting point Gridworld problem. Experimental result on Gridworld show that the algorithm has faster convergence rate and better convergence performance than policy iteration.

关键词：强化学习值迭代格子世界

分类号：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的值迭代算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的值迭代算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索