检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]清华大学计算机科学与技术系,北京100084
出 处:《计算机研究与发展》2013年第2期240-247,共8页Journal of Computer Research and Development
基 金:国防"十一五"预研基金项目(402040202);国防"十二五"预研基金项目(041802008)
摘 要:以Q学习为代表的传统强化学习方法都是维持一个状态与动作的映射表.这种状态-动作的二层映射结构缺乏灵活性,同时不能有效地使用先验知识引导学习过程.为了解决这一问题,提出了一种基于多动机强化学习(MMRL)的框架.MMRL框架在状态与动作间引入动机层,将原有的状态-动作二层结构扩展为状态-动机-动作三层结构,可根据经验设置多个动机.通过动机的设定实现了先验知识的利用,进而加快了强化学习的进程,提高了强化学习的灵活性.实验表明,通过合理的动机设定,多动机强化学习的学习速度较传统强化学习有明显提升.The traditional reinforcement learning methods such as Q-learning, maintain a table that maps the states to the actions. This simple dual-layer mapping structure has been widely used in many applied situations. However, dual-layer mapping structure of state-action lacks flexibility, while priori knowledge can not be effectively used to guide the learning process. To solve this problem, a new reinforcement learning framework is proposed, called multi-motive reinforcement learning (MMRL). Between state layer and action layer, MMRL framework introduces motive layer, in which multiple motives can be set based on experience. In this way, the original state-action dual-layer structure is extended to state-motive-action triple-layer structure. Under this framework, two new corresponding algorithms are presented, the first is MMQ-unique algorithm and the second is MMQ- voting algorithm. Moreover, it is stated that traditional reinforcement learning methods can be seen as a degenerate form of multi-motive reinforcement learning. That is to say, multi-motive reinforcement learning framework is a superset of traditional methods. This new framework and the corresponding algorithms improve the flexibility of reinforcement learning by adding the motive layer, and make use of priori knowledge to speed up the learning process. Experiments demonstrate that, multi-motive reinforcement learning can get better performance than the traditional reinforcement learning methods significantly by setting reasonable motives.
关 键 词:强化学习 多动机 Q学习 MMQ—unique算法 MMQ-voting算法
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28