优先价值网络的多智能体协同强化学习算法  

Multi-agent Cooperative Reinforcement Learning Algorithm Based on Prioritized Value Network

在线阅读下载全文

作  者:苗国英[1] 孙英博 王慧琴 MIAO Guoying;SUN Yingbo;WANG Huiqin(School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China)

机构地区:[1]南京信息工程大学自动化学院,江苏南京210044

出  处:《控制工程》2025年第4期691-698,共8页Control Engineering of China

基  金:国家自然科学基金资助项目(62073169)。

摘  要:为了提高多智能体系统的智能决策能力,针对多智能体强化学习的经验回放存在的弊端,以及智能体决策强调动作值而忽略状态值等问题,提出一种基于优先价值网络的多智能体强化学习算法。首先,该算法引入优先经验回放机制,根据重要性权重进行经验复用,解决通过随机采样进行经验复用存在的问题;其次,该算法在智能体的值网络中引入价值优势网络形式,对比状态值与动作优势的信息,使智能体更快地学习到优势动作。多个协同场景的实验结果表明,该算法能够提升多智能体系统的学习与合作质量,使智能体更快、更好地做出决策,完成给定任务。In order to improve the intelligent decision-making ability of the multi-agent system,a multi-agent reinforcement learning algorithm based on prioritized value network is proposed,the disadvantages of experience replay of multi-agent reinforcement learning and the problems of emphasizing action value and ignoring state value in agent decision-making are solved.Firstly,the algorithm introduces a preferential experience replay mechanism to reuse experience according to importance weights,which solves the problem of experience reuse through random sampling.Secondly,the value advantage network is introduced into the value network of the agent to compare the information of state value and action advantage,which makes the agent learn the dominant action fast.The experimental results of multiple collaborative scenarios show that the algorithm can improve the learning and cooperation quality of the multi-agent system,so that the agent can make decisions faster and better,and complete the given task.

关 键 词:多智能体 强化学习 优先经验回放 价值优势网络 状态值 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象