无界报酬折扣半马氏决策模型矩最优策略的存在性  

The Existence of a Moment Optimal Policy in Discounted Semi-Markov Decision Model with Unbounded Rewards

在线阅读下载全文

作  者:伍从斌[1] 

机构地区:[1]云南大学计算机科学系

出  处:《云南大学学报(自然科学版)》1991年第3期199-206,共8页Journal of Yunnan University(Natural Sciences Edition)

摘  要:本文在矩最优准则下讨论具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型。对任意ε>0,证明了k阶矩ε-最优平稳策略的存在性,从而一般策略类中的矩最优性等价于平稳策略类中的矩最优性。(k-1)矩最优策略π为(k)矩最优的充要条件是(-1)^(k+1)V_k(π)满足最优方程,这里V_k(π)为使用π时的总折扣报酬的k阶矩。对平稳策略,给出了折扣报酬的各阶矩的递推公式,如果每个状态可用的行动集为有限集,证明了矩最优平稳策略的存在性,并建立了构造所有矩最优平稳策略的迭代算法。This paper deals with discounted semi-Markov decision model with a countable state space, arbitrary action space and unbounded rewards under the criterion of moment optimality. The existence of stationary k-th moment ε-optimal policies is proved for every ε>0. By use of this result, it is shown that moment optimality among all policies is the same as moment optimality among all stationary polticies. A ( k-1) moment optimal policy π is also (k) moment optimal if and only if (-1) k+1Vk (π) satisfies optimal equation where Vk (π) is k-th moment of the total discounted rewards when π is used. The recursion formulae are presented for all moments of return for stationary policies. In the finite action case, the existence of stationary moment optimal policy is obtained and an iteration algorithm to construct all stationary moment optimal policies is developed.

关 键 词:折扣模型 无界报酬  最优策略 

分 类 号:O212.5[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象