一种基于一致性的多智能体Q学习算法

Multi-agent Q-learning Algorithm Based on Consensus

作　　者：崔浩岩张震赵德京廖登宇 CUI Haoyan;ZHANG Zhen;ZHAO Dejing;LIAO Dengyu(School of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China)

机构地区：[1]青岛大学自动化学院,山东青岛266071 [2]山东省工业控制技术重点实验室,山东青岛266071

出　　处：《控制工程》2024年第7期1169-1177,共9页Control Engineering of China

基　　金：国家自然科学基金资助项目(61903209);青岛市博士后应用研究项目。

摘　　要：针对多智能体系统中智能体通信能力受限和多智能体强化学习中联合动作空间维数灾难问题,提出一种基于一致性的多智能体Q学习(multi-agent Q-learning based on consensus,MAQC)算法。该算法采用集中训练-分散执行框架。在集中训练阶段,MAQC算法采用值分解方法缓解联合动作空间维数灾难问题。此外,每个智能体将自己感知到的局部状态和接收到的邻居的局部状态发送给所有邻居,最终使网络中的智能体获得所有智能体的全局状态。智能体所需的时间差分信息由一致性算法获得,智能体只需向邻居发送时间差分信息的分量信息。在执行阶段,每个智能体只需根据与自己动作有关的Q值函数来选择动作。结果表明,MAQC算法能够收敛到最优联合策略。A multi-agent Q-learning based on consensus(MAQC)algorithm is proposed,which uses a framework of centralized training and decentralized execution to address the problems of limited communi-cation ability of agents in multi-agent systems and joint action space dimension disaster in multi-agent reinforcement learning.In the centralized training stage,MAQC algorithm uses the value decomposition method to alleviate the dimension disaster of joint action space.In addition,each agent sends its perceived local state and the received local state of its neighbors to all neighbors.In this way,the agents in the network can obtain the global state of all agents.The time difference information required by each agent is obtained by the consensus algorithm,and each agent needs to send only the component information of the time difference information to its neighbors.In the execution stage,each agent needs to select the action according to only the Q-value function conditioned on its own action.Experimental results show that MAQC algorithm can converge to the optimal joint strategy.

关键词：多智能体强化学习智能体通信一致性 Q学习值分解

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于一致性的多智能体Q学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于一致性的多智能体Q学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索