可数状态空间的平均成本马氏决策过程

Average cost Markov decision processes with countable state spaces

作　　者：张俊玉[1] 吴怡婷夏俐曹希仁[3] ZHANG Jun-yu;WU Yi-ting;XIA Li;CAO Xi-ren(School of Mathematics,Sun Yat-Sen University,Guangzhou Guangdong 510275,China;School of Business,Sun Yat-Sen University,Guangzhou Guangdong 510275,China;Department of Electronic and Computer Engineering,Hong Kong University of Science and Technology,Hong Kong,China)

机构地区：[1]中山大学数学学院,广东广州510275 [2]中山大学管理学院,广东广州510275 [3]香港科技大学电子与计算机工程系,中国香港

出　　处：《控制理论与应用》2021年第11期1707-1716,共10页Control Theory & Applications

基　　金：Supported by the National Natural Science Foundation of China(61673019,61773411,11931018,62073346);the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University(2020B1212060032);the Guangdong Basic and Applied Basic Research Foundation(2021A1515010057,2021A1515011984)。

摘　　要：具有可数状态空间的马尔可夫决策过程(Markov decision process,MDP)在平均准则下,最优(平稳)策略不一定存在.本文研究平均准则可数状态MDP中满足最优不等式的最优策略.不同于消去折扣(因子)方法,利用离散的Dynkin公式推导本文的主要结果.首先给出遍历马氏链的泊松方程和两个零常返马氏链的例子,证明了满足两个方向相反的最优不等式的最优策略存在性.其次,通过两个比较引理和性能差分公式,证明了正常返链和多链最优策略的存在性,并进一步推广到其他情形.特别地,本文通过几个应用举例,说明平均准则性能敏感的本质.本文的结果完善了可数状态MDP在平均准则下的最优不等式的理论.For the long-run average of a Markov decision process(MDP)with countable state spaces,the optimal(stationary)policy may not exist.In this paper,we study the optimal policies satisfying optimality inequality in a countable-state MDP under the long-run average criterion.Different from the vanishing discount approach,we use the discrete Dynkin’s formula to derive the main results of this paper.We first provide the Poisson equation of an ergodic Markov chain and two instructive examples about null recurrent Markov chains,and demonstrate the existence of optimal policies for two optimality inequalities with opposite directions.Then,from two comparison lemmas and the performance difference formula,we prove the existence of optimal policies under positive recurrent chains and multi-chains,which is further extended to other situations.Especially,several examples of applications are provided to illustrate the essential of performance sensitivity of the long-run average.Our results make a supplement to the literature work on the optimality inequality of average MDPs with countable states.

关键词：马尔可夫决策过程平均准则可数状态空间 Dynkin公式泊松方程性能敏感

分类号：O211.62[理学—概率论与数理统计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

可数状态空间的平均成本马氏决策过程

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

可数状态空间的平均成本马氏决策过程

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索