风险敏感马氏决策过程与状态扩充变换

Risk-sensitive Markov decision processes and state augmentation transformation

作　　者：马帅夏俐 MA Shuai;XIA Li(School of Business,Sun Yat-sen University,Guangzhou 510275,China)

出　　处：《中山大学学报（自然科学版）（中英文）》2023年第1期181-191,共11页Acta Scientiarum Naturalium Universitatis Sunyatseni

基　　金：国家自然科学基金(62073346,U1811462)。

摘　　要：在马氏决策过程中,过程的随机性由策略与转移核决定,优化目标的随机性受随机报酬与随机策略的影响,其中随机报酬往往可通过简化转化为确定型报酬。当优化准则为经典的期望类准则,如平均准则或折扣准则时,报酬函数的简化不会影响优化结果。然而对风险敏感的优化准则,此类简化将影响风险目标值,进而破坏策略的最优性。针对该问题,状态扩充变换将随机信息重组进扩充状态空间,在简化报酬函数的同时保持随机报酬过程不变。本文以三种定义于累积折扣报酬的经典风险测度为例,在策略评价中对比报酬函数简化与状态扩充变换对风险评估的影响。理论验证与数值实验均表明,当报酬函数形式较为复杂时,状态扩充变换可在简化报酬函数的同时保持风险测度不变。In the theory of Markov decision processes,the randomness of the objective stems from not only the stochasticity of the process but also the randomnesses of the one-step reward and the policy.When the optimality criterion concerns only the risk-neutral expectation of the objective,the reward(function)simplification will not affect the optimization result.However,the simplification will change the stochastic reward sequence,which results in a modification to a risk-sensitive objective,i.e.,a risk measure.Since some theoretical methods may require a simple reward function in a practical environment with a complicated one,to bridge this gap,we propose a technique termed state augmentation transformation,which preserves the stochastic reward sequence in a transformed process with a reward function in a simple form.Taking three classical risk measures(variance,exponential utility,and conditional value at risk)for example,the numerical experiment shows that the state augmentation transformation keeps the risk measures intact,while the reward simplification fails.

关键词：马氏决策过程状态扩充变换风险报酬函数简化

分类号：O177.2[理学—数学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

风险敏感马氏决策过程与状态扩充变换

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

风险敏感马氏决策过程与状态扩充变换

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索