优先价值网络的多智能体协同强化学习算法

Multi-agent Cooperative Reinforcement Learning Algorithm Based on Prioritized Value Network

作　　者：苗国英[1] 孙英博王慧琴 MIAO Guoying;SUN Yingbo;WANG Huiqin(School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China)

机构地区：[1]南京信息工程大学自动化学院,江苏南京210044

出　　处：《控制工程》2025年第4期691-698,共8页Control Engineering of China

基　　金：国家自然科学基金资助项目(62073169)。

摘　　要：为了提高多智能体系统的智能决策能力,针对多智能体强化学习的经验回放存在的弊端,以及智能体决策强调动作值而忽略状态值等问题,提出一种基于优先价值网络的多智能体强化学习算法。首先,该算法引入优先经验回放机制,根据重要性权重进行经验复用,解决通过随机采样进行经验复用存在的问题;其次,该算法在智能体的值网络中引入价值优势网络形式,对比状态值与动作优势的信息,使智能体更快地学习到优势动作。多个协同场景的实验结果表明,该算法能够提升多智能体系统的学习与合作质量,使智能体更快、更好地做出决策,完成给定任务。In order to improve the intelligent decision-making ability of the multi-agent system,a multi-agent reinforcement learning algorithm based on prioritized value network is proposed,the disadvantages of experience replay of multi-agent reinforcement learning and the problems of emphasizing action value and ignoring state value in agent decision-making are solved.Firstly,the algorithm introduces a preferential experience replay mechanism to reuse experience according to importance weights,which solves the problem of experience reuse through random sampling.Secondly,the value advantage network is introduced into the value network of the agent to compare the information of state value and action advantage,which makes the agent learn the dominant action fast.The experimental results of multiple collaborative scenarios show that the algorithm can improve the learning and cooperation quality of the multi-agent system,so that the agent can make decisions faster and better,and complete the given task.

关键词：多智能体强化学习优先经验回放价值优势网络状态值

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

优先价值网络的多智能体协同强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

优先价值网络的多智能体协同强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索