一种MDP基于性能势的并行Q学习算法  

Parallel Q-learning Algorithms for MDPs Based on Performance Potentials

在线阅读下载全文

作  者:程文娟[1,2] 唐昊[1] 李豹[1] 周雷[1] 

机构地区:[1]合肥工业大学计算机与信息学院,合肥230009 [2]合肥工业大学管理学院,合肥230009

出  处:《系统仿真学报》2009年第9期2670-2674,2678,共6页Journal of System Simulation

基  金:国家自然科学基金项目(60404009);安徽省自然科学基金项目(070416242;090412046);安徽高校省级自然科学研究重点项目(KJ2007A063)

摘  要:在性能势理论框架内,研究折扣和平均准则马尔可夫决策过程(MDP)的统一并行Q学习算法。提出了独立并行Q学习算法和状态划分并行Q学习算法,重点讨论了算法中的关键参数的设计,即同步点如何选择的同步策略和如何合成Q因子的Q值构建策略,给出了一种固定步长结合一定偏移量的同步策略,并分析了并行中Q值构建策略的确定原则,给出了几种Q值构建策略的选择方法。仿真实验表明并行Q学习算法的有效性。Based on performance potential, some unified parallel implementation methods of Q-learning were considered for Markov decision processes (MDPs) with both average- and discounted criteria. An independent parallel Q-learning algorithm and a state-partition parallel Q-learning algorithm were proposed, where the synchronization strategy was mainly discussed, that is, how to choose synchronization point, and the building strategy of Q values, that is, how to construct new Q-factors with some of the derived Q-factors. A synchronization strategy was provided by combining fixed step with offset step. In addition, the principle for establishing building strategy was analyzed, and then some methods were provided for obtaining building strategy. The simulation results illustrate the effectiveness of the proposed parallel algorithms.

关 键 词:Q学习 马尔可夫决策过程 性能势 并行算法 

分 类 号:TP202[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象