检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南昌大学江西省机器人与焊接自动化重点实验室,南昌330031
出 处:《控制与决策》2017年第12期2153-2161,共9页Control and Decision
基 金:国家863计划项目(SS2013AA041003)
摘 要:为解决当前近似策略迭代增强学习算法普遍存在计算量大、基函数不能完全自动构建的问题,提出一种基于状态聚类的非参数化近似广义策略迭代增强学习算法(NPAGPI-SC).该算法利用二级随机采样过程采集样本,利用trial-and-error过程和以样本完全覆盖为目标的估计方法计算逼近器初始参数,利用delta规则和最近邻思想在学习过程中自适应地调整逼近器,利用贪心策略选择应执行的动作.一级倒立摆平衡控制的仿真实验结果验证了所提出算法的有效性和鲁棒性.A nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering(NPAGPI-SC) is proposed to solve the problems such as large calculating quantity and building basis function incompletely automated for the current approximation policy iteration reinforcement learning algorithm. In this algorithm,two stage random sampling process is used to collect samples, the trial-and-error process and the estimation algorithm for covering samples completely are utilized to compute approximator's initial parameters, the delta rule and nearest neighbor method are exploited to adjust the approximator automatically in the learning process, and the greedy strategy is adopted to select an action. The results of simulation on the balancing control of a single inverted pendulum show the effectiveness and robustness of the proposed algorithm.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.252.203