基于注意力机制的信息预处理多智能体强化学习算法

Attention-based information preprocessing multi-agent reinforcement learning algorithm

作　　者：杜泳韬赵岭忠翟仲毅 Du Yongtao;Zhao Lingzhong;Zhai Zhongyi(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China)

机构地区：[1]桂林电子科技大学计算机与信息安全学院,桂林541004

出　　处：《国外电子测量技术》2024年第3期91-97,共7页Foreign Electronic Measurement Technology

摘　　要：多智能体强化学习在群体控制领域具有广泛应用,然而传统的强化学习方法(如Q-Learning或策略梯度)在多智能体环境中表现不佳。在训练过程中,每个智能体的策略不断变化。当一个智能体基于环境信息做出决策时,其他智能体的决策可能已经影响了环境信息,导致智能体感知的转移概率分布和奖赏函数发生变化,使得环境变得非平稳,训练无法有效进行。为了缓解这一问题,研究了一种基于多头自注意力的多智能体强化学习算法。该方法考虑了其他智能体的行动策略,利用多头自注意力算法使智能体能够学习对决策影响最大的因素,成功地学习了复杂的多智能体协调策略。在实验结果中平均回报达值到了0.82,远高于传统算法的表现。实验结果表明,所提出的基于多头自注意力的多智能体强化学习算法能够有效解决环境不平稳导致的多智能体学习困难问题,提高了多智能体强化学习算法的收敛速度和平稳性。Multi-agent reinforcement learning has a broad range of applications in group control.However,traditional reinforcement learning methods,such as Q-learning or policy gradient,prove unsuitable for multi-agent environments.As training progresses,the strategy of each agent undergoes changes.When one agent makes decisions based on environmental information,the decisions of other agents may have already influenced the environment's information,leading to changes in the transition probability distribution and the reward function perceived by the agent.This renders the environment non-stationary,hindering the training process.To address these issues,this paper explores a multiagent reinforcement learning algorithm based on multi-head self-attention.The approach considers the action strategies of other agents and utilizes a multi-head self-attention algorithm to enable agents to learn the most influential factors in the environment,successfully acquiring complex multi-agent coordination policies.In the experimental results,the average return value reaches 0.82,which is much higher than the performance of traditional algorithm.Experimental results demonstrate the effectiveness of the proposed multi-agent reinforcement learning algorithm based on multi-head self-attention in overcoming challenges related to the non-stationary environment,thereby enhancing the convergence speed and stability of the multi-agent reinforcement learning algorithm.

关键词：多智能体强化学习多头自注意力信息预处理:策略梯度:非平稳

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于注意力机制的信息预处理多智能体强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于注意力机制的信息预处理多智能体强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索