检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:季挺 张华[1] JI Ting;ZHANG Hua(Key Lab of Robot and Welding Automation of Jiangxi Province,Nanchang University,Nanchang 330031,China)
机构地区:[1]南昌大学江西省机器人与焊接自动化重点实验室,南昌330031
出 处:《计算机工程》2018年第11期313-320,共8页Computer Engineering
基 金:国家高技术研究发展计划(SS2013AA041003)
摘 要:针对在线近似策略迭代强化学习算法收敛速度较慢的问题,提出一种非参数化近似策略迭代并行强化学习算法。通过学习单元构建样本采集过程确定并行单元数量,基于径向基函数线性逼近结构设计强化学习单元,然后采用以样本空间完全覆盖为目标的估计方法实现单元自主构建,并基于近似策略迭代进行单元自主学习。其中,各单元通过平均加权法融合得到算法的整体策略。一级倒立摆仿真结果表明,与online LSPI算法和BLSPI算法相比,该算法在保持较高加速比的同时具有较高的效率,其控制参数更少,收敛速度更快。To solve the problem of slow convergence speed of the online approximation strategy iteration reinforcement learning algorithm,a nonparametric approximation strategy iteration parallel reinforcement learning algorithm is proposed.The number of parallel units is determined through the sample collection process of building learning units,the reinforcement learning units are designed based on the linear approximation structure of Radial Basis Function(RBF),and then the independent construction of units is realized by using the estimation method with the target of full coverage of sample space.The independent learning of units is carried out based on approximation strategy iteration.Among them,the whole strategy of the algorithm is obtained by the average weighting method of each unit.Simulation results of first-order inverted pendulum show that,compared with online LSPI algorithm and BLSPI algorithm,this algorithm has higher efficiency while maintaining higher acceleration ratio,fewer control parameters and faster convergence speed.
关 键 词:并行强化学习 非参数化 策略迭代 K均值聚类 倒立摆
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.239.69