检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:苗国英[1] 孙英博 王慧琴 MIAO Guoying;SUN Yingbo;WANG Huiqin(School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China)
机构地区:[1]南京信息工程大学自动化学院,江苏南京210044
出 处:《控制工程》2025年第4期691-698,共8页Control Engineering of China
基 金:国家自然科学基金资助项目(62073169)。
摘 要:为了提高多智能体系统的智能决策能力,针对多智能体强化学习的经验回放存在的弊端,以及智能体决策强调动作值而忽略状态值等问题,提出一种基于优先价值网络的多智能体强化学习算法。首先,该算法引入优先经验回放机制,根据重要性权重进行经验复用,解决通过随机采样进行经验复用存在的问题;其次,该算法在智能体的值网络中引入价值优势网络形式,对比状态值与动作优势的信息,使智能体更快地学习到优势动作。多个协同场景的实验结果表明,该算法能够提升多智能体系统的学习与合作质量,使智能体更快、更好地做出决策,完成给定任务。In order to improve the intelligent decision-making ability of the multi-agent system,a multi-agent reinforcement learning algorithm based on prioritized value network is proposed,the disadvantages of experience replay of multi-agent reinforcement learning and the problems of emphasizing action value and ignoring state value in agent decision-making are solved.Firstly,the algorithm introduces a preferential experience replay mechanism to reuse experience according to importance weights,which solves the problem of experience reuse through random sampling.Secondly,the value advantage network is introduced into the value network of the agent to compare the information of state value and action advantage,which makes the agent learn the dominant action fast.The experimental results of multiple collaborative scenarios show that the algorithm can improve the learning and cooperation quality of the multi-agent system,so that the agent can make decisions faster and better,and complete the given task.
关 键 词:多智能体 强化学习 优先经验回放 价值优势网络 状态值
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49