改进Q学习算法在多智能体强化学习中的应用被引量：2

Application of Improved Q-learning Algorithm in Multi-agent Reinforcement Learning

作　　者：赵德京马洪聪王家曜周维庆 ZHAO Dejing;MA Hongcong;WANG Jiayao;ZHOU Weiqing(School of Automation,Qingdao University,Qingdao Shandong 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Limited Liability Company,Qingdao Shandong 266043,China)

机构地区：[1]青岛大学自动化学院,山东青岛266071 [2]青岛石化检修安装工程有限责任公司,山东青岛266043

出　　处：《自动化与仪器仪表》2022年第6期13-16,22,共5页Automation & Instrumentation

基　　金：青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。

摘　　要：Q-learning作为一种经典的强化学习算法,其在离散状态下存在计算量高、收敛速度慢等问题。Speedy Q-learning是Q-learning的变种,目的是解决Q-learning算法收敛速度慢问题。为解决多智能体强化学习中“维数灾”问题,在Speedy Q-learning算法的基础上提出了一种基于动作采样的(action sampling based on Speedy Q-learning,ASSQ)算法。该算法采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,将上一迭代步更新后的Q值作为下一状态的最大Q值,有效降低了Q值的比较次数,整体上提升了算法的收敛速度。为减少学习阶段计算量,算法在集中训练阶段求取下一状态最大Q值时,并没有遍历所有联合动作Q值,而只在联合动作空间上进行部分采样。在动作选择和执行阶段,每个智能体又根据学习到的策略独立选择动作,从而有效提高了算法的学习效率。通过在目标运输任务上验证,ASSQ算法能够以100%的成功率学习到最优联合策略,且计算量明显少于Q-learning算法。As a classical reinforcement learning algorithm,Q-learning has some problems such as high computational load and slow convergence speed in discrete state.Speedy Q-learning is a variant of Q-learning,which aims to solve the problem of slow convergence of Q-learning algorithm.In order to solve the problem of"dimension disaster"in multi-agent reinforcement learning,an action sampling based on Speedy Q-learning(ASSQ)algorithm is proposed.Centralized training with decentralized execution(CTDE)is adopted in this algorithm.The Q-value updated in the last iteration step is taken as the maximum Q-value of the next state,effectively reducing the comparison times of Q-values,which improves the convergence speed of the algorithm on the whole.In order to reduce the amount of computation in the learning stage,the algorithm does not traverse all the joint action Q-values in the centralized training stage,but only carries out partial sampling in the joint action space.In the stage of action selection and execution,each agent chooses actions independently according to the learned strategy,thus effectively improving the learning efficiency of the algorithm.Through the verification on target transportation task,ASSQ algorithm can learn the optimal joint strategy with 100%success rate,and the calculation amount is significantly less than Q-learning algorithm.

关键词：Q-LEARNING Speedy Q-learning 多智能体强化学习动作采样

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进Q学习算法在多智能体强化学习中的应用被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进Q学习算法在多智能体强化学习中的应用 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

改进Q学习算法在多智能体强化学习中的应用被引量：2