检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙英博 苗国英[1] 庄亚楠 SUN Yingbo;MIAO Guoying;ZHUANG Ya’nan(School of Automation,Nanjing University of Information Science&Technology,Nanjing 210044,China)
机构地区:[1]南京信息工程大学自动化学院,江苏南京210044
出 处:《传感器与微系统》2023年第9期25-29,共5页Transducer and Microsystem Technologies
基 金:国家自然科学基金资助项目(62073169);江苏省“333工程”项目(BRA2020067)。
摘 要:针对多智能体深度强化学习在值函数拟合过程中未充分考虑智能体之间的作用关系,且动作大概率随机,导致迭代试错过程的数据浪费、协作效率低、收敛速度慢等问题,提出了一种在协作中的平均权重机制和改进的探索策略。首先,利用平均深度Q学习网络(DQN)在多智能体的值函数策略网络中设计一种权重结构,减小智能体间的不利影响;其次,改进探索策略,利用欧氏距离提高智能体的探索效率与策略协作性,增大系统跳出局部最小点的能力。通过多个场景实验的结果表明,所提方法提高了多智能体的学习能力和学习效率。Aiming at the problem that multi-agent deep reinforcement learning,the action relationship between the agents is not fully considered in value function fitting process,and the action is random with high probability,which leads to data waste of iterative process of trial error,low collaboration efficiency,slow convergence speed,and so on,an average weight mechanism in collaboration and an improved exploration strategy are proposed.Firstly,the average deep Q learning network(DQN)is used to design a weight structure in the multi-agent value function strategy network to reduce the adverse influence among agents.Secondly,the exploration strategy is improved by using Euclidean distance,which not only improves exploration efficiency of the agent and strategic collaboration,but also increases the ability of the system to jump out of the local minimum point.The results of experiments in multiple scenarios show that the proposed method improves the learning ability and learning efficiency of multi-agents.
分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15