二次奖罚学习自动机

Quadratic Reward-Penalty Learning Automaton

作　　者：刘晓[1]

出　　处：《航空计算技术》1999年第2期47-49,共3页Aeronautical Computing Technique

摘　　要：研究了奖罚型学习自动机的一种非线性强化算法。与线性的奖罚模型（ＬＲＰ）不同，新模型的行动选择概率的更新函数为二次的。这使得该模型的学习性能优于ＬＲＰ，且对不同的环境，其具有不同的行为和特点。In this paper a nonlinear reinforcement algorithm for reward penalty type learning automata is studied. It is different from the linear reward penalty model (L RP ), the update function of action selection probability of the presented algorithm is quadratic. The learning performance of the new model is superior to the one of the L RP Additionally, for different environments, the proposed automaton possesses different behaviours and properties.

关键词：人工智能强化学习学习自动机二次奖罚

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

二次奖罚学习自动机

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

二次奖罚学习自动机

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索