增强Q学习在非确定马尔可夫系统寻优问题中的应用被引量：4

The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question

出　　处：《计算机工程与应用》2005年第13期36-38,146,共4页Computer Engineering and Applications

基　　金：国家863高技术研究发展计划项目(编号:2001AA4422200)

摘　　要：增强学习属于机器学习的一种,它通过与环境的交互获得策略的改进,其在线学习和自适应学习的特点使其成为解决策略寻优问题有力的工具。多智能体系统是人工智能领域的一个研究热点,对于多智能体学习技术的研究需要建立在系统环境模型的基础之上,由于多个智能体的存在,智能体之间的相互影响使得多智能体系统高度复杂,多智能体系统环境属于非确定马尔可夫模型,因此直接把基于马尔可夫模型的增强学习技术引入多智能体系统是不合适的。论文基于智能体间独立的学习机制,提出了一种改进的多智能体Q学习算法,使其适用于非确定马尔可夫环境,并对该学习技术在多智能体系统RoboCup中的应用进行了研究,实验证明了该学习技术的有效性与泛化能力,最后简要给出了多智能体增强学习研究的方向及进一步的工作。Reinforcement learning belongs to machine learning,with it an autonomous learning agent can improve its action policy by interacting with environment.Owing to on-line learning ability and self-adapted ability reinforcement learning becomes a powerful tool for optimal policy finding questions.Multi-Agent System(MAS)is an active subfield of AI,for the presence of other agents,it is difficult to find an optimal action policy even for a single agent,obviously the environment of MAS is an nondeterministic Markov Decision Processes(MDPs)one,the study of multi-agent learning is a challenge to current reinforcement learning which based on MDPs.Based on agent's independent learning ability this article firstly proposes a MAS reinforcement Q learning algorithm that match the nondeterministic MDPs environment,then applies this algorithm in RoboCup which is a typical MAS.The result of experiments has proved the algorithm's efficiency.Finally,we have briefly pointed out some directions of multi-agent reinforcement learning and further work.

关键词：多智能体增强学习非确定马尔可夫系统策略寻优

分类号：TP24[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

增强Q学习在非确定马尔可夫系统寻优问题中的应用被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

增强Q学习在非确定马尔可夫系统寻优问题中的应用 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

增强Q学习在非确定马尔可夫系统寻优问题中的应用被引量：4