基于蒙特卡罗学习的多机器人自组织协作

Self-organizing coordination of multi-robot based on Monte Carlo learning

机构地区：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨理工大学机械动力学院,哈尔滨150080

出　　处：《计算机工程与应用》2007年第30期23-25,32,共4页Computer Engineering and Applications

基　　金：国家自然科学基金项目(the National Natural Science Foundation of China under Grant No.69985002);国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2006AA04Z259)。

摘　　要：强化学习是提高机器人完成任务效率的有效方法,目前比较流行的学习方法一般采用累积折扣回报方法,但平均值回报在某些方面更适于多机器人协作。累积折扣回报方法在机器人动作层次上可以提高性能,但在多机器人任务层次上却不会得到很好的协作效果,而采用平均回报值的方法,就可以改变这种状态。本文把基于平均值回报的蒙特卡罗学习应用于多机器人合作中,得到很好的学习效果,实际机器人实验结果表明,采用平均值回报的方法优于累积折扣回报方法。Reinforcement learning is an effective way for accomplishing task in multi-robot system.While much of the work has focused On optimizing discounted cumtilative reward,optimizing average reward is sometimes a more suitable criterion for multi-robot coordination.Learning algorithms based on discounted rewards,such as Q learning,can attain a well result at the action-level,but it cannot perform well at the task-level.However,learning methods based on average reward,such as the Monte Carlo algorithm,are capable of achieving the optimal result through cooperation at the task-level.Real robot experiment shows that the algorithm adopting the average reward is superior to the one adopting the discounted cumulative reward.

关键词：强化学习多机器人协作蒙特卡罗学习 Q学习

分类号：TP242.6[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于蒙特卡罗学习的多机器人自组织协作

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于蒙特卡罗学习的多机器人自组织协作

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索