基于情景记忆式强化学习的协作运输方法  

Cooperative Transportation Method Based on Episodic Memory Reinforcement Learning

在线阅读下载全文

作  者:周维庆 张震 宋光乐 刘明阳 宋婷婷 ZHOU Weiqing;ZHANG Zhen;SONG Guange;LIU Mingyang;SONG Tingting(School of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China;School of Intelligent Manufacturing,Weifang University of Science and Technology,Weifang 261000,China;Vehicle Maintenance Department,Third Operation Center of Qingdao Metro Operation Co.,Ltd.,Qingdao 266071,China)

机构地区:[1]青岛大学自动化学院,山东青岛266071 [2]山东省工业控制技术重点实验室,山东青岛266071 [3]潍坊科技学院智能制造学院,山东潍坊261000 [4]青岛地铁运营有限公司运营三中心车辆维保部,山东青岛266071

出  处:《控制工程》2024年第7期1203-1210,共8页Control Engineering of China

基  金:国家自然科学基金资助项目(61903209)。

摘  要:针对情景记忆算法中记忆池中的样本利用率低的问题,提出了一种基于情景记忆和值函数分解框架相结合的合作型多智能体强化学习算法,即情景记忆值分解(episodic memory value decomposition,EMVD)算法。EMVD算法在情景记忆部分以时间差分误差平方为依据来更新记忆池,使记忆池中一直保留对学习效果提升更重要的情景记忆样本,并将情景记忆算法与神经网络相结合,提高了算法的收敛速度。为了将EMVD算法应用于机器人协作运输任务中,设定机器人和运输目标的位置为状态,并且设计了回报函数。仿真结果表明,EMVD算法可以探索出机器人协作运输任务的最优策略,提高了算法的收敛速度。To solve the problem of low sample utilization in memory pool in episodic memory algorithm,a cooperative multi-agent reinforcement learning algorithm based on the combination of episodic memory and value function decomposition framework is proposed,that is,episodic memory value decomposition(EMVD)algorithm.In the episodic memory part,EMVD algorithm updates the memory pool based on the square of the time difference error,so that the memory pool always retains the episodic memory samples that are more important to improve the learning effect.Moreover,the episodic memory algorithm is combined with the neural network to improve the convergence speed of the algorithm.In order to apply the EMVD algorithm to the robot cooperative transportation task,the position of the robot and the transportation target is set as the state,and the return function is designed.The simulation results show that the EMVD algorithm can explore the optimal strategy of the robot cooperative transportation task and improve the convergence speed of the algorithm.

关 键 词:强化学习 多智能体强化学习 情景记忆 机器人协作运输 时间差分误差 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象