检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中南大学信息科学与工程学院,长沙410075
出 处:《计算机工程与应用》2005年第13期36-38,146,共4页Computer Engineering and Applications
基 金:国家863高技术研究发展计划项目(编号:2001AA4422200)
摘 要:增强学习属于机器学习的一种,它通过与环境的交互获得策略的改进,其在线学习和自适应学习的特点使其成为解决策略寻优问题有力的工具。多智能体系统是人工智能领域的一个研究热点,对于多智能体学习技术的研究需要建立在系统环境模型的基础之上,由于多个智能体的存在,智能体之间的相互影响使得多智能体系统高度复杂,多智能体系统环境属于非确定马尔可夫模型,因此直接把基于马尔可夫模型的增强学习技术引入多智能体系统是不合适的。论文基于智能体间独立的学习机制,提出了一种改进的多智能体Q学习算法,使其适用于非确定马尔可夫环境,并对该学习技术在多智能体系统RoboCup中的应用进行了研究,实验证明了该学习技术的有效性与泛化能力,最后简要给出了多智能体增强学习研究的方向及进一步的工作。Reinforcement learning belongs to machine learning,with it an autonomous learning agent can improve its action policy by interacting with environment.Owing to on-line learning ability and self-adapted ability reinforcement learning becomes a powerful tool for optimal policy finding questions.Multi-Agent System(MAS)is an active subfield of AI,for the presence of other agents,it is difficult to find an optimal action policy even for a single agent,obviously the environment of MAS is an nondeterministic Markov Decision Processes(MDPs)one,the study of multi-agent learning is a challenge to current reinforcement learning which based on MDPs.Based on agent's independent learning ability this article firstly proposes a MAS reinforcement Q learning algorithm that match the nondeterministic MDPs environment,then applies this algorithm in RoboCup which is a typical MAS.The result of experiments has proved the algorithm's efficiency.Finally,we have briefly pointed out some directions of multi-agent reinforcement learning and further work.
关 键 词:多智能体 增强学习 非确定马尔可夫系统 策略寻优
分 类 号:TP24[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145