检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广东工业大学自动化学院,广东广州510006
出 处:《计算机工程与设计》2014年第3期905-908,共4页Computer Engineering and Design
摘 要:强化学习是人工智能领域中解决学习控制的一种重要方法。在强化学习算法中,平均奖赏强化学习是以平均奖赏值作为参照标准,适用于解决具有循环特性或不具终结状态的问题,其存在参数和环境的敏感及收敛速度慢等问题,并且强调的是单个智能体的独立学习。针对上述问题,考虑单个智能体与其它智能体的关系及影响,将一种改进的基于性能势强化学习算法———G-learning引入到多智能体系统中,设计出一种新的强化学习算法,将新设计的强化学习算法应用在RoboCup的Keepaway平台上。仿真结果表明了在选择较好参考状态的条件下有效提高了强化学习算法在Keepaway平台的性能表现。Reinforcement learning is an important method which is to solve the learning-control in the field of artificial intelli- gence. In reinforcement learning, the average reward reinforcement learning is based on the average reward value as the reference standard. It is more natural and computationally advantageous to formulate tasks so that the controller's objective is to maximize the average payoff received per time step in many problems, for example that the optimal behavior is a limit cycle. However, it has many problems such as oversensitive with parameter and converging slowly. In addition, traditional learning always emphasi- zes the independent learning of a single agent. Considering the relationship between independent learning and group learning, an improved G-learning based on performance potential is proposed which is applied to the multi-agent systems. By using the im- proved algorithm on Keepaway platform, the result of the simulations and experiments show that the new reward function based on some better reference state is better.
关 键 词:足球机器人 强化学习 性能势 G-learning算法 多智能体系统
分 类 号:TP242.6[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.135.25