检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:伍从斌[1]
机构地区:[1]云南大学计算机科学系
出 处:《云南大学学报(自然科学版)》1991年第3期199-206,共8页Journal of Yunnan University(Natural Sciences Edition)
摘 要:本文在矩最优准则下讨论具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型。对任意ε>0,证明了k阶矩ε-最优平稳策略的存在性,从而一般策略类中的矩最优性等价于平稳策略类中的矩最优性。(k-1)矩最优策略π为(k)矩最优的充要条件是(-1)^(k+1)V_k(π)满足最优方程,这里V_k(π)为使用π时的总折扣报酬的k阶矩。对平稳策略,给出了折扣报酬的各阶矩的递推公式,如果每个状态可用的行动集为有限集,证明了矩最优平稳策略的存在性,并建立了构造所有矩最优平稳策略的迭代算法。This paper deals with discounted semi-Markov decision model with a countable state space, arbitrary action space and unbounded rewards under the criterion of moment optimality. The existence of stationary k-th moment ε-optimal policies is proved for every ε>0. By use of this result, it is shown that moment optimality among all policies is the same as moment optimality among all stationary polticies. A ( k-1) moment optimal policy π is also (k) moment optimal if and only if (-1) k+1Vk (π) satisfies optimal equation where Vk (π) is k-th moment of the total discounted rewards when π is used. The recursion formulae are presented for all moments of return for stationary policies. In the finite action case, the existence of stationary moment optimal policy is obtained and an iteration algorithm to construct all stationary moment optimal policies is developed.
分 类 号:O212.5[理学—概率论与数理统计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249