基于性能势的A*平均奖赏强化学习算法研究  被引量:2

Study on the A* Average Reward Reinforcement Learning Algorithm Based on Performance Potentials

在线阅读下载全文

作  者:黄浩晖 杨宛璐[1] 陈玮[1] 

机构地区:[1]广东工业大学自动化学院,广东广州510006

出  处:《计算机仿真》2014年第7期338-341,共4页Computer Simulation

摘  要:强化学习和性能势理论是当前人工智能领域的研究热点,RoboCup足球机器人仿真为人工智能和机器人学研究提供了一个良好的实验平台,针对强化学习和性能势理论在足球机器人仿真应用中求解过程不稳定和收敛速度过慢问题,提出了一个新的强化学习算法——基于性能势的A*平均奖赏强化学习算法(GA*-learning)。GA*-learning在基于性能势的平均奖赏强化学习算法(G-learning)中加入启发式函数,根据启发式策略确定动作的选择,从而加快学习收敛速度。把GA*-learning运用到通过简化的机器人足球领域——keepaway,仿真结果验证了算法能有效提高系统的性能和收敛速度。Reinforcement learning (RL) and performance potentials theory are research hotspots of Artificial Intelligence (AI). RoboCup Soccer Simulation is a good test platform in which the AI and Robotics are studied. Considering the disadvantages of RL and performance potentials theory used in soccer simulation, such as unstable during the solving process and the long learning time, this work presents a new RL algorithm, called GA * -learning, that is based on the performance potential theory and heuristic search. A heuristic function that influences the choice of the actions according to some heuristic policies is used in G-learning to accelerate the rate of convergence. With the in- troduction of a simplified simulator for the robot soccer domain-keepaway, a set of empirical evaluations are conducted for the proposed algorithm. Simultaneously, the simulation results show the improvement in the system performance and learning time of the algorithm.

关 键 词:强化学习 性能势 启发式搜索 半马尔科夫决策过程 

分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置] TP391.9[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象