模拟人类发散思维的测度值马尔可夫理论模型  被引量:1

A theoretical model of measure-valued Markov processes simulating the divergent thinking of man

在线阅读下载全文

作  者:王蓁蓁[1] 邢汉承[1] 张志政[1] 倪庆剑[1] 

机构地区:[1]东南大学计算机科学和工程学院,南京210096,南京大学计算机软件新技术国家重点实验室,南京,210093

出  处:《南京大学学报(自然科学版)》2008年第2期148-156,共9页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(90412014);计算机软件新技术开放课题(A200707)

摘  要:本文提出测度值马尔可夫决策过程新模型.在此模型下,agent对环境的把握用测度概念来表示,于是agent则根据测度来决定自己的最优行动以得到最优策略,因此本文也提供了测度值马尔可夫决策过程的最优策略算法.该模型是部分可观察马尔可夫决策过程的推广,它反映人类思维的一个重要特征,人们在把握全部状态可能性(即对状态空间进行权衡度量)的态势下,思考问题并选择自己的最优行动.部分可观察马尔可夫决策过程只是它的一种特例.This paper presents a model called measure-valued Markov decision processes (MVMDPs) and within this model the understanding of the agent to the environment is denoted by the mathematical notion of measure. The agent decides his own optimal action according to this measure and then acquires his optimal policy. So we present an algorithm of finding optimal policy under MVMDP, which can also be considered as the approximate optimal policy algorithm of partially observed Markov decision processes (POMDPs). This model is a generalization of a partially observed Markov decision process, that is, partially observed Markov decision process is a particular case of the measure-valued Markov decision process. Be that as it may, it is essentially different from all other papers about POMDPs. Firstly, the main spirit of general POMDPs is to transform partially observable Markov decision problems off a physical state space into a regular Markov decision problem (MDP) on the corresponding belief state space, and such researches all identify the belief state as a probability distribution over the state space. So most of the POMDP models based on this spirit pay more attention to algorithm Of various kinds for finding the optimal policy and to novel refinements of existing techniques. However, our work is not based on the transformation between the POMDP on a physical state space and the MDP on a belief state space. On the contrary we take the measure, a more general notion than belief state, on the state space as a new studying object. Then the Markov decision problem we will discuss is taking place on the space composed of these measures. In this way, we have a measure-valued Markov decision process. Secondly, MVMDP, based on the latest theory of measur-valued branching processes in modern probability, reflects an important characteristic of human mind: that people think about problems and choose their own optimal actions in contexts where all the possible states are caught (i. e. , they are able to appropria

关 键 词:测度值 测度值分枝过程 马尔可夫决策过程 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象