基于连续动作学习自动机的联想强化学习被引量：4

Associative Reinforcement Learning Based on Continuous-Action Learning Automata

作　　者：刘晓[1]

出　　处：《山西大学学报（自然科学版）》2015年第3期426-431,共6页Journal of Shanxi University(Natural Science Edition)

摘　　要：联想强化学习是一种在不确定环境下的机器学习问题,其中学习系统的目标是为环境的每一种输入状态确定一个最佳的输出动作。文章提出一种新的连续动作学习自动机(CALA)。该自动机以一个可变区间作为动作集,并依照均匀分布方式产生输出动作。根据环境反馈的成功/失败信号,学习算法对动作区间的端点进行自适应更新。将该方法应用于求解两个经典的联想强化学习问题,仿真结果演示了新算法相对于两种现有的CALA算法的优越性。与旧算法相比,新算法的学习性能平均可提高1.9%到5.7%,最高可提高22.4%到65.2%。Associative reinforcement learning is a machine learning problem in an uncertain environment,where the goal of the learning system is to determine an optimal output action for each environmental state input.In this paper,a new continuous-action learning automaton（CALA）is proposed.The automaton uses a variable interval as its action set,and generates actions with uniform distribution over this interval.The end-points of the action-interval are adaptively updated according to the success/failure signals feedback from the environment.The proposed method is applied to solve two classical associative reinforcement learning tasks.Simulation results demonstrate the superiority of the new algorithm relative to two existing CALA algorithms.Compared with the old algorithms,the learning performance of the new algorithm can be improved on average by 1.9%to 5.7%,and at best by 22.4%to 65.2%.

关键词：学习自动机连续动作学习自动机强化学习联想强化学习奖励-惩罚

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于连续动作学习自动机的联想强化学习被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于连续动作学习自动机的联想强化学习 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于连续动作学习自动机的联想强化学习被引量：4