检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:纪龙 苗国英[1] 李涛[1] 张静怡 JI Long;MIAO Guo-ying;LI Tao;ZHANG Jing-yi(Nanjing University of Information Science&Technology,School of Automation,Nanjing Jiangsu 210044,China)
机构地区:[1]南京信息工程大学自动化学院,江苏南京210044
出 处:《计算机仿真》2022年第11期448-452,共5页Computer Simulation
基 金:江苏省“333工程”项目(BRA2020067);国家自然科学基金(62073169)。
摘 要:针对智能体通信时受外界信息轰炸、协作式多智能体在训练初期的无效探索等问题,提出一种改进的UA-QMIX算法。通过价值函数分解理论和集中式学习分布式执行作为基本条件,在智能体效用网络中加入注意力机制,增强智能体之间对彼此影响力的关注。采用传统的ε-贪婪策略来平衡探索与利用,改进ε-贪婪策略为理性ε-贪婪策略,减少盲目探索。仿真结果表明,所提算法有效降低信息过载以及训练初期的无效探索,且在星际争霸中的收敛速度和平均胜率都达到了最优。An improved UA-QMIX algorithm is proposed to solve the problems of agent communication being bombarded by external information and the ineffective exploration of cooperative multi-agent at the initial training stage.First of all,with the value function decomposition theory and centralized learning distributed execution as the basic conditions,the attention mechanism was added to the agent’s utility network to enhance the attention of the agents to each other.Then,the traditionalε-greedy policy was adopted to balance exploration and utilization,and theε-greedy policy was improved to a rationalε-greedy policy to reduce blind exploration.The simulation results show that the algorithm in our work has effectively reduced information overloadand the invalid exploration at the beginning of training.Moreover,the convergence speed and average win rate in StarCraft have reached the best.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15