检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马帅 夏俐 MA Shuai;XIA Li(School of Business,Sun Yat-sen University,Guangzhou 510275,China)
出 处:《中山大学学报(自然科学版)(中英文)》2023年第1期181-191,共11页Acta Scientiarum Naturalium Universitatis Sunyatseni
基 金:国家自然科学基金(62073346,U1811462)。
摘 要:在马氏决策过程中,过程的随机性由策略与转移核决定,优化目标的随机性受随机报酬与随机策略的影响,其中随机报酬往往可通过简化转化为确定型报酬。当优化准则为经典的期望类准则,如平均准则或折扣准则时,报酬函数的简化不会影响优化结果。然而对风险敏感的优化准则,此类简化将影响风险目标值,进而破坏策略的最优性。针对该问题,状态扩充变换将随机信息重组进扩充状态空间,在简化报酬函数的同时保持随机报酬过程不变。本文以三种定义于累积折扣报酬的经典风险测度为例,在策略评价中对比报酬函数简化与状态扩充变换对风险评估的影响。理论验证与数值实验均表明,当报酬函数形式较为复杂时,状态扩充变换可在简化报酬函数的同时保持风险测度不变。In the theory of Markov decision processes,the randomness of the objective stems from not only the stochasticity of the process but also the randomnesses of the one-step reward and the policy.When the optimality criterion concerns only the risk-neutral expectation of the objective,the reward(function)simplification will not affect the optimization result.However,the simplification will change the stochastic reward sequence,which results in a modification to a risk-sensitive objective,i.e.,a risk measure.Since some theoretical methods may require a simple reward function in a practical environment with a complicated one,to bridge this gap,we propose a technique termed state augmentation transformation,which preserves the stochastic reward sequence in a transformed process with a reward function in a simple form.Taking three classical risk measures(variance,exponential utility,and conditional value at risk)for example,the numerical experiment shows that the state augmentation transformation keeps the risk measures intact,while the reward simplification fails.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.70.193