一种多动机强化学习框架  被引量:6

A Multi-Motive Reinforcement Learning Framework

在线阅读下载全文

作  者:赵凤飞[1] 覃征[1] 

机构地区:[1]清华大学计算机科学与技术系,北京100084

出  处:《计算机研究与发展》2013年第2期240-247,共8页Journal of Computer Research and Development

基  金:国防"十一五"预研基金项目(402040202);国防"十二五"预研基金项目(041802008)

摘  要:以Q学习为代表的传统强化学习方法都是维持一个状态与动作的映射表.这种状态-动作的二层映射结构缺乏灵活性,同时不能有效地使用先验知识引导学习过程.为了解决这一问题,提出了一种基于多动机强化学习(MMRL)的框架.MMRL框架在状态与动作间引入动机层,将原有的状态-动作二层结构扩展为状态-动机-动作三层结构,可根据经验设置多个动机.通过动机的设定实现了先验知识的利用,进而加快了强化学习的进程,提高了强化学习的灵活性.实验表明,通过合理的动机设定,多动机强化学习的学习速度较传统强化学习有明显提升.The traditional reinforcement learning methods such as Q-learning, maintain a table that maps the states to the actions. This simple dual-layer mapping structure has been widely used in many applied situations. However, dual-layer mapping structure of state-action lacks flexibility, while priori knowledge can not be effectively used to guide the learning process. To solve this problem, a new reinforcement learning framework is proposed, called multi-motive reinforcement learning (MMRL). Between state layer and action layer, MMRL framework introduces motive layer, in which multiple motives can be set based on experience. In this way, the original state-action dual-layer structure is extended to state-motive-action triple-layer structure. Under this framework, two new corresponding algorithms are presented, the first is MMQ-unique algorithm and the second is MMQ- voting algorithm. Moreover, it is stated that traditional reinforcement learning methods can be seen as a degenerate form of multi-motive reinforcement learning. That is to say, multi-motive reinforcement learning framework is a superset of traditional methods. This new framework and the corresponding algorithms improve the flexibility of reinforcement learning by adding the motive layer, and make use of priori knowledge to speed up the learning process. Experiments demonstrate that, multi-motive reinforcement learning can get better performance than the traditional reinforcement learning methods significantly by setting reasonable motives.

关 键 词:强化学习 多动机 Q学习 MMQ—unique算法 MMQ-voting算法 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象