SMDP基于性能势的M步向前策略迭代  

M-step look-ahead policy iteration for semi-Markov decision processes based on performance potentials

在线阅读下载全文

作  者:吴玉华[1] 唐昊[1] 周雷[1] 

机构地区:[1]合肥工业大学计算机与信息学院,合肥230009

出  处:《吉林大学学报(工学版)》2006年第6期958-962,共5页Journal of Jilin University:Engineering and Technology Edition

基  金:国家自然科学基金项目(60404009);安徽省自然科学基金资助项目(050420303);合肥工业大学中青年科技创新群体计划资助项目

摘  要:运用基于性能势的M步向前(look-ahead)异步策略迭代算法研究了半Markov决策过程(SMDP)优化问题。首先给出了基于性能势理论求解的一种M步向前策略迭代算法。该算法不仅对标准策略迭代算法和一般的异步策略迭代算法都适用,而且对SMDP在折扣和平均准则下的优化也是统一的;另外给出了两种性能准则下基于即时差分学习的M步向前仿真策略迭代。最后通过一个数值算例比较了各种算法的特点。The semi-Markov decision processes (SMDPs) were studied by the M-step look-ahead policy iteration(PI) based on the performance potentials. A M-step look-ahead PI algorithm based on the solution of performance potential theory was proposed. The algorithm can be used to the standard PI as well as the conventional asynchronous PI, and is also consistent with the SMDP optimization under both discounted and averaged criteria. The formulation for the M-step look-ahead PI based on TD learning under both performance criteria was given. The features of the above algorithm were demonstrated by a numerical example.

关 键 词:计算机应用 半MARKOV决策过程 性能势 M步向前策略迭代 即时差分学习 

分 类 号:TP202[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象