检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王蓁蓁[1] 邢汉承[1] 张志政[1] 倪庆剑[1]
机构地区:[1]东南大学计算机科学和工程学院,南京210096,南京大学计算机软件新技术国家重点实验室,南京,210093
出 处:《南京大学学报(自然科学版)》2008年第2期148-156,共9页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金(90412014);计算机软件新技术开放课题(A200707)
摘 要:本文提出测度值马尔可夫决策过程新模型.在此模型下,agent对环境的把握用测度概念来表示,于是agent则根据测度来决定自己的最优行动以得到最优策略,因此本文也提供了测度值马尔可夫决策过程的最优策略算法.该模型是部分可观察马尔可夫决策过程的推广,它反映人类思维的一个重要特征,人们在把握全部状态可能性(即对状态空间进行权衡度量)的态势下,思考问题并选择自己的最优行动.部分可观察马尔可夫决策过程只是它的一种特例.This paper presents a model called measure-valued Markov decision processes (MVMDPs) and within this model the understanding of the agent to the environment is denoted by the mathematical notion of measure. The agent decides his own optimal action according to this measure and then acquires his optimal policy. So we present an algorithm of finding optimal policy under MVMDP, which can also be considered as the approximate optimal policy algorithm of partially observed Markov decision processes (POMDPs). This model is a generalization of a partially observed Markov decision process, that is, partially observed Markov decision process is a particular case of the measure-valued Markov decision process. Be that as it may, it is essentially different from all other papers about POMDPs. Firstly, the main spirit of general POMDPs is to transform partially observable Markov decision problems off a physical state space into a regular Markov decision problem (MDP) on the corresponding belief state space, and such researches all identify the belief state as a probability distribution over the state space. So most of the POMDP models based on this spirit pay more attention to algorithm Of various kinds for finding the optimal policy and to novel refinements of existing techniques. However, our work is not based on the transformation between the POMDP on a physical state space and the MDP on a belief state space. On the contrary we take the measure, a more general notion than belief state, on the state space as a new studying object. Then the Markov decision problem we will discuss is taking place on the space composed of these measures. In this way, we have a measure-valued Markov decision process. Secondly, MVMDP, based on the latest theory of measur-valued branching processes in modern probability, reflects an important characteristic of human mind: that people think about problems and choose their own optimal actions in contexts where all the possible states are caught (i. e. , they are able to appropria
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229