检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广东工业大学自动化学院,广东广州510006
出 处:《计算机仿真》2014年第7期338-341,共4页Computer Simulation
摘 要:强化学习和性能势理论是当前人工智能领域的研究热点,RoboCup足球机器人仿真为人工智能和机器人学研究提供了一个良好的实验平台,针对强化学习和性能势理论在足球机器人仿真应用中求解过程不稳定和收敛速度过慢问题,提出了一个新的强化学习算法——基于性能势的A*平均奖赏强化学习算法(GA*-learning)。GA*-learning在基于性能势的平均奖赏强化学习算法(G-learning)中加入启发式函数,根据启发式策略确定动作的选择,从而加快学习收敛速度。把GA*-learning运用到通过简化的机器人足球领域——keepaway,仿真结果验证了算法能有效提高系统的性能和收敛速度。Reinforcement learning (RL) and performance potentials theory are research hotspots of Artificial Intelligence (AI). RoboCup Soccer Simulation is a good test platform in which the AI and Robotics are studied. Considering the disadvantages of RL and performance potentials theory used in soccer simulation, such as unstable during the solving process and the long learning time, this work presents a new RL algorithm, called GA * -learning, that is based on the performance potential theory and heuristic search. A heuristic function that influences the choice of the actions according to some heuristic policies is used in G-learning to accelerate the rate of convergence. With the in- troduction of a simplified simulator for the robot soccer domain-keepaway, a set of empirical evaluations are conducted for the proposed algorithm. Simultaneously, the simulation results show the improvement in the system performance and learning time of the algorithm.
关 键 词:强化学习 性能势 启发式搜索 半马尔科夫决策过程
分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置] TP391.9[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145