检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:程文娟[1,2] 唐昊[1] 李豹[1] 周雷[1]
机构地区:[1]合肥工业大学计算机与信息学院,合肥230009 [2]合肥工业大学管理学院,合肥230009
出 处:《系统仿真学报》2009年第9期2670-2674,2678,共6页Journal of System Simulation
基 金:国家自然科学基金项目(60404009);安徽省自然科学基金项目(070416242;090412046);安徽高校省级自然科学研究重点项目(KJ2007A063)
摘 要:在性能势理论框架内,研究折扣和平均准则马尔可夫决策过程(MDP)的统一并行Q学习算法。提出了独立并行Q学习算法和状态划分并行Q学习算法,重点讨论了算法中的关键参数的设计,即同步点如何选择的同步策略和如何合成Q因子的Q值构建策略,给出了一种固定步长结合一定偏移量的同步策略,并分析了并行中Q值构建策略的确定原则,给出了几种Q值构建策略的选择方法。仿真实验表明并行Q学习算法的有效性。Based on performance potential, some unified parallel implementation methods of Q-learning were considered for Markov decision processes (MDPs) with both average- and discounted criteria. An independent parallel Q-learning algorithm and a state-partition parallel Q-learning algorithm were proposed, where the synchronization strategy was mainly discussed, that is, how to choose synchronization point, and the building strategy of Q values, that is, how to construct new Q-factors with some of the derived Q-factors. A synchronization strategy was provided by combining fixed step with offset step. In addition, the principle for establishing building strategy was analyzed, and then some methods were provided for obtaining building strategy. The simulation results illustrate the effectiveness of the proposed parallel algorithms.
分 类 号:TP202[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15