基于实用推理的多智能体协作强化学习算法  被引量:3

Multi-agent cooperative reinforcement learning algorithm based on practical reasoning

在线阅读下载全文

作  者:潘莹[1,2] 李德华[1] 梁京章[2] 王俊英[1] 

机构地区:[1]华中科技大学图像识别与人工智能研究所,湖北武汉430074 [2]广西大学信息网络中心,广西南宁530004

出  处:《华中科技大学学报(自然科学版)》2010年第4期54-57,共4页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(69775022);国家高技术研究发展计划资助项目(863-306-ZT04-06-3)

摘  要:针对将单AgentQ-学习协作算法直接扩展到多Agent系统会导致状态-动作对集合的急剧膨胀、从而影响多Agent的协作学习速度的问题,提出了基于实用推理的多Agent协作强化学习算法.在实用推理框架下,首先在慎思过程中通过考虑群体意图来确定单个Agent的子意图;然后,在手段-目的推理过程中采用Q-学习算法得出实现子意图的最优策略,从而实现群体意图.在Q-学习算法中,各Agent只需考虑自身的状态-动作的值函数更新,对其他Agent值函数的更新可以不加考虑,从而大大降低了算法的空间复杂度,提高了学习速度.追捕问题的仿真实验结果验证了算法的有效性.The problem was studied that the size of the state space increases exponentially with increased number of agents when applying directly single-agent Q-learning algorithm to multi-agent environment.A multi-agent cooperative reinforcement learning algorithm is proposed.In the framework of practical reasoning,the proposed method first determined the sub-intention of each agent according to the group intention in the deliberation process.Then,in the process of means-ends reasoning the Q-learning algorithm was taken to decide the best action plan of sub-intention to achieve the group intention,where each agent only needs to update its value function,and without considering other agents鈥?value functions.Thus,the space complexity is greatly reduced and the learning speed improves significantly.The experimental results of the pursuit problem show the efficiency of the proposed algorithm.

关 键 词:多智能体系统 强化学习 马尔可夫过程 协作 实用推理 慎思过程 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象