机器人足球赛中基于增强学习的行为参数优化  

BEHAVIOR PARAMETERS' OPTIMIZATION OF ROBOT SOCCER BASED ON REINFORCEMENT LEARNING

在线阅读下载全文

作  者:顾冬雷[1] 陈卫东[1] 席裕庚[1] 

机构地区:[1]上海交通大学自动化研究所,上海200030

出  处:《模式识别与人工智能》2001年第2期140-144,共5页Pattern Recognition and Artificial Intelligence

基  金:国家863计划资助项目

摘  要:采用增强学习方法优化机器人行为的参数,让研究者去决定机器人控制系统的行为结构,让机器人在实际运行过程中通过不断地试错学习在线优化性能指标,既利用了人的高级智能,又避开了研究人员无法深入机器人运行细节的困难,具有明显的实用性。机器人足球赛仿真实验结果显示了方法的有效性。Reinforcement learning is a popular learning method in the research domain of mobile robot because of its concise concept and simple implementation. Its current application mainly focuses on two area, one is to learn the relationship between discrete states and actions to obtain new behaviors, the other is to coordinate existed behaviors to generate purposive behavior sequences. Useing reinforcement learning to optimize behavior parameters is a practical way to improve the robot's performance. Having researches to determine the behaviors' structure and control logic, having robot to determine the optimal parameters by online trial-and-error learning, this method utilizes human's high intelligence and avoids the shortcoming that researchers can not go deep into the execution details, so that it has practical value. We developed a simulator for robot soccer, in which each robot has three behaviors based on three different motor schemas. In this paper we introduce a reinforcement learning method to optimize the weights of motor schemas within each behavior by online trial-and-error learning. The learning method uses the Gauss kernel to distribute the reward to the whole action space so that it can deal with continuous actions. We have one team's behavior parameters fixed, and let the other team learn the parameters' optimal probability density distribution, because every policy used by robot in robot soccer only can win the score with different probability. The simulation results show that the behaviors' probability density distribution of learning team convergent. The learning team can obtain the optimal parameters by online learning.

关 键 词:增强式学习 机器人足球赛 参数优化 控制系统 移动机器人 

分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象