基于模型的层次化强化学习算法

Hierarchical Reinforcement Learning Based on System Model

出　　处：《北京交通大学学报》2006年第5期1-5,共5页JOURNAL OF BEIJING JIAOTONG UNIVERSITY

基　　金：国家自然科学基金资助项目(60373029)

摘　　要：针对强化学习算法的状态值泛化和随机探索策略在确定性MDP系统控制中存在着学习效率低的问题,本文提出基于模型的层次化强化学习算法.该算法采用两层结构,底层利用系统模型,采用贪婪策略选择探索动作,完成强化学习任务.而高层通过对状态区域的分析,指导底层的学习,纠正底层错误的动作.高层对底层的学习的指导作用主要包括:在泛化过程中,对泛化区域中正确与错误的状态判断值分别采用不同的学习因子,减小泛化对算法收敛性的影响;建立状态区域的推理规则,用规则指导未知状态区域的学习,加快学习速度;利用系统模型和推理规则,将探索过程集中于系统的可控区域,克服采用随机探索策略需要系统全状态空间内搜索的问题.本文提出的算法能在较短的时间内实现系统的初步控制,其有效性在二级倒立摆的控制中得到验证.This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model. The algorithm adopts the two-lay structure. The low-layer selects the action by the greed policy and the high-layer detects and analyses the state value in the state space, guide the learning of low-layer, corrects the wrong the action selected by low-layer. The high-layer role includes the following： decrease the effect of state value convergence due to the improper generalization by setting the different learning parameters for the state value update in the state space; built the control rule in the state space and accelerate the learning rate by select action according to control rule; reduce the exploration of uncontrollable state space and non-optimal actions and limits the exploration concentrate on the controllable space. The proposed algorithm in this paper can achieve control quickly. Simulation results for the control of double inverted pendulum are presented to show the effectiveness of the proposed algorithm.

关键词：强化学习马尔科夫决策过程探索策略倒立摆

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模型的层次化强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模型的层次化强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索