基于策略迭代和值迭代的POMDP算法被引量：7

A Policy-and Value-Iteration Algorithm for POMDP

机构地区：[1]深圳职业技术学院电子与信息工程学院,广东深圳518055

出　　处：《计算机研究与发展》2008年第10期1763-1768,共6页Journal of Computer Research and Development

摘　　要：部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的.Partially observable Markov decision processes （POMDP） changes the non-Markovian into Markovian over the belief state space. It has been an important branch of stochastic decision processes for its characteristics of describing the real world. Tradional algorithms to solve POMPDs are value iteration algorithm and policy iteration algorithm. However, the complexity of exact solution algorithms for such POMDPs are typically computationally intractable for all but the smallest problems. At first, the authors describe the principles and decision processes of POMDP, and then present a policy- and valueiteration algorithm （PVIA） for partially observable Markov decision processes. This algorithm uses advantages of policy iteration and value iteration when programming makes use of policy iteration and when computing uses value iteration. This algorithm using linear programming and dynamic programming resolves curse of dimensionality problem when the belief state is large, and obtains the approximate optimal value. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed effciently together. The algorithm was implemented in the SZPT_Roc team, which took the 2nd place in the simulation league of the RoboCup 2006 Chinese Open Championship. Finally, compared with some typical algorithms, experimental results show that the algorithm is practical and feasible.

关键词：部分可观察Markov决策决策算法智能体值迭代策略迭代

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于策略迭代和值迭代的POMDP算法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于策略迭代和值迭代的POMDP算法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于策略迭代和值迭代的POMDP算法被引量：7