基于UA-QMIX的价值函数分解方法研究  

Research on Value Function Decomposition Method Based on UA-QMIX

在线阅读下载全文

作  者:纪龙 苗国英[1] 李涛[1] 张静怡 JI Long;MIAO Guo-ying;LI Tao;ZHANG Jing-yi(Nanjing University of Information Science&Technology,School of Automation,Nanjing Jiangsu 210044,China)

机构地区:[1]南京信息工程大学自动化学院,江苏南京210044

出  处:《计算机仿真》2022年第11期448-452,共5页Computer Simulation

基  金:江苏省“333工程”项目(BRA2020067);国家自然科学基金(62073169)。

摘  要:针对智能体通信时受外界信息轰炸、协作式多智能体在训练初期的无效探索等问题,提出一种改进的UA-QMIX算法。通过价值函数分解理论和集中式学习分布式执行作为基本条件,在智能体效用网络中加入注意力机制,增强智能体之间对彼此影响力的关注。采用传统的ε-贪婪策略来平衡探索与利用,改进ε-贪婪策略为理性ε-贪婪策略,减少盲目探索。仿真结果表明,所提算法有效降低信息过载以及训练初期的无效探索,且在星际争霸中的收敛速度和平均胜率都达到了最优。An improved UA-QMIX algorithm is proposed to solve the problems of agent communication being bombarded by external information and the ineffective exploration of cooperative multi-agent at the initial training stage.First of all,with the value function decomposition theory and centralized learning distributed execution as the basic conditions,the attention mechanism was added to the agent’s utility network to enhance the attention of the agents to each other.Then,the traditionalε-greedy policy was adopted to balance exploration and utilization,and theε-greedy policy was improved to a rationalε-greedy policy to reduce blind exploration.The simulation results show that the algorithm in our work has effectively reduced information overloadand the invalid exploration at the beginning of training.Moreover,the convergence speed and average win rate in StarCraft have reached the best.

关 键 词:强化学习 多智能体 深度学习 注意力机制 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象