一种新的多智能体Q学习算法被引量：13

A New Q Learning Algorithm for Multi-agent Systems

机构地区：[1]中南大学信息科学与工程学院,长沙410083 [2]贵州省高速公路开发总公司,贵阳550003

出　　处：《自动化学报》2007年第4期367-372,共6页Acta Automatica Sinica

基　　金：湖南省自然科学基金项目(06JJ50144);国家杰出青年科学基金项目(60425310)资助~~

摘　　要：针对非确定马尔可夫环境下的多智能体系统,提出了一种新的多智能体Q学习算法．算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择．同时对算法的收敛性和学习性能进行了分析．该算法在多智能体系统RoboCup中的应用进一步表明了算法的有效性与泛化能力．Due to the presence of other agents, the environment of multi-agent systems （MAS） cannot be simply treated as Markov decision processes （MDPs）. The current reinforcement learning algorithms which are based on MDPs must be reformed before it can be applicable to MAS. Based on an agent＇s independent learning ability this paper proposes a novel Q-learning algorithm for MAS -an agent learning other agents＇ action policies through observing the joint action. The policies of other agents are expressed as action probability distribution matrixes. A concise and yet useful updating method for the matrixes is proposed. The full joint probability of distribution matrixes guarantees the learning agent to choose his/her optimal action. The convergence and performance of the proposed algorithm are analyzed theoretically. When applied to RoboCup, our algorithm showed high learning efficiency and good generalization ability. Finally, we briefly point out some directions of multi-agent reinforcement learning.

关键词：多智能体增强学习 Q学习

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种新的多智能体Q学习算法被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种新的多智能体Q学习算法 被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种新的多智能体Q学习算法被引量：13