风险敏感马氏决策过程与状态扩充变换  

Risk-sensitive Markov decision processes and state augmentation transformation

在线阅读下载全文

作  者:马帅 夏俐 MA Shuai;XIA Li(School of Business,Sun Yat-sen University,Guangzhou 510275,China)

机构地区:[1]中山大学管理学院,广东广州510275

出  处:《中山大学学报(自然科学版)(中英文)》2023年第1期181-191,共11页Acta Scientiarum Naturalium Universitatis Sunyatseni

基  金:国家自然科学基金(62073346,U1811462)。

摘  要:在马氏决策过程中,过程的随机性由策略与转移核决定,优化目标的随机性受随机报酬与随机策略的影响,其中随机报酬往往可通过简化转化为确定型报酬。当优化准则为经典的期望类准则,如平均准则或折扣准则时,报酬函数的简化不会影响优化结果。然而对风险敏感的优化准则,此类简化将影响风险目标值,进而破坏策略的最优性。针对该问题,状态扩充变换将随机信息重组进扩充状态空间,在简化报酬函数的同时保持随机报酬过程不变。本文以三种定义于累积折扣报酬的经典风险测度为例,在策略评价中对比报酬函数简化与状态扩充变换对风险评估的影响。理论验证与数值实验均表明,当报酬函数形式较为复杂时,状态扩充变换可在简化报酬函数的同时保持风险测度不变。In the theory of Markov decision processes,the randomness of the objective stems from not only the stochasticity of the process but also the randomnesses of the one-step reward and the policy.When the optimality criterion concerns only the risk-neutral expectation of the objective,the reward(function)simplification will not affect the optimization result.However,the simplification will change the stochastic reward sequence,which results in a modification to a risk-sensitive objective,i.e.,a risk measure.Since some theoretical methods may require a simple reward function in a practical environment with a complicated one,to bridge this gap,we propose a technique termed state augmentation transformation,which preserves the stochastic reward sequence in a transformed process with a reward function in a simple form.Taking three classical risk measures(variance,exponential utility,and conditional value at risk)for example,the numerical experiment shows that the state augmentation transformation keeps the risk measures intact,while the reward simplification fails.

关 键 词:马氏决策过程 状态扩充变换 风险 报酬函数简化 

分 类 号:O177.2[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象