基于预测状态表示的Q学习算法被引量：3

Q-Learning Algorithm Based on Predictive State Representations

出　　处：《西安交通大学学报》2008年第12期1472-1475,1485,共5页Journal of Xi'an Jiaotong University

基　　金：国家"211工程"资助项目;教育部"985工程"资助项目

摘　　要：针对不确定环境的规划问题,提出了基于预测状态表示的Q学习算法.将预测状态表示方法与Q学习算法结合,用预测状态表示的预测向量作为Q学习算法的状态表示,使得到的状态具有马尔可夫特性,满足强化学习任务的要求,进而用Q学习算法学习智能体的最优策略,可解决不确定环境下的规划问题.仿真结果表明,在发现智能体的最优近似策略时,算法需要的学习周期数与假定环境状态已知情况下需要的学习周期数大致相同.A Q-learning algorithm based on predictive state representations is proposed for solving the problem of planning under uncertainty. The predictive state representations is combined with the Q-learning algorithm. The prediction vector of predictive state representations is used as the state representation of Q-learning algorithms, so that the obtained states have the Markov prop- erties and satisfy the requirement of reinforcement learning tasks. Then the Q-learning algorithm is used to find the optimal policy and the problem of planning under uncertainty is solved. Simulation results show that with our algorithm, the number of episodes needed in finding the near-optimal policy of an agent is approximately the same as that of the world states being assumed to be known.

关键词：不确定环境规划预测状态表示 Q学习算法奶酪迷宫

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于预测状态表示的Q学习算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于预测状态表示的Q学习算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于预测状态表示的Q学习算法被引量：3