检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵德京 马洪聪 王家曜 周维庆 ZHAO Dejing;MA Hongcong;WANG Jiayao;ZHOU Weiqing(School of Automation,Qingdao University,Qingdao Shandong 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Limited Liability Company,Qingdao Shandong 266043,China)
机构地区:[1]青岛大学自动化学院,山东青岛266071 [2]青岛石化检修安装工程有限责任公司,山东青岛266043
出 处:《自动化与仪器仪表》2022年第6期13-16,22,共5页Automation & Instrumentation
基 金:青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。
摘 要:Q-learning作为一种经典的强化学习算法,其在离散状态下存在计算量高、收敛速度慢等问题。Speedy Q-learning是Q-learning的变种,目的是解决Q-learning算法收敛速度慢问题。为解决多智能体强化学习中“维数灾”问题,在Speedy Q-learning算法的基础上提出了一种基于动作采样的(action sampling based on Speedy Q-learning,ASSQ)算法。该算法采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,将上一迭代步更新后的Q值作为下一状态的最大Q值,有效降低了Q值的比较次数,整体上提升了算法的收敛速度。为减少学习阶段计算量,算法在集中训练阶段求取下一状态最大Q值时,并没有遍历所有联合动作Q值,而只在联合动作空间上进行部分采样。在动作选择和执行阶段,每个智能体又根据学习到的策略独立选择动作,从而有效提高了算法的学习效率。通过在目标运输任务上验证,ASSQ算法能够以100%的成功率学习到最优联合策略,且计算量明显少于Q-learning算法。As a classical reinforcement learning algorithm,Q-learning has some problems such as high computational load and slow convergence speed in discrete state.Speedy Q-learning is a variant of Q-learning,which aims to solve the problem of slow convergence of Q-learning algorithm.In order to solve the problem of"dimension disaster"in multi-agent reinforcement learning,an action sampling based on Speedy Q-learning(ASSQ)algorithm is proposed.Centralized training with decentralized execution(CTDE)is adopted in this algorithm.The Q-value updated in the last iteration step is taken as the maximum Q-value of the next state,effectively reducing the comparison times of Q-values,which improves the convergence speed of the algorithm on the whole.In order to reduce the amount of computation in the learning stage,the algorithm does not traverse all the joint action Q-values in the centralized training stage,but only carries out partial sampling in the joint action space.In the stage of action selection and execution,each agent chooses actions independently according to the learned strategy,thus effectively improving the learning efficiency of the algorithm.Through the verification on target transportation task,ASSQ algorithm can learn the optimal joint strategy with 100%success rate,and the calculation amount is significantly less than Q-learning algorithm.
关 键 词:Q-LEARNING Speedy Q-learning 多智能体强化学习 动作采样
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15