稀疏奖励场景下基于适应性状态近似的多智能体强化学习

Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios

作　　者：方宝富[1] 余婷婷王浩[1] 王在俊[2] FANG Baofu;YU Tingting;WANG Hao;WANG Zaijun(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China;Key Laboratory of Flight Techniques and Flight Safety,Civil Aviation Flight University of China,Guanghan 618307,China)

机构地区：[1]合肥工业大学计算机与信息学院,安徽合肥230601 [2]中国民用航空飞行学院民航飞行技术与飞行安全重点实验室,四川广汉618307

出　　处：《机器人》2024年第6期663-671,682,共10页Robot

基　　金：国家自然科学基金(61872327);安徽省自然科学基金(2308085MF203);安徽高校协同创新项目(GXXT-2022-055);民航飞行技术与飞行安全重点实验室开放基金(FZ2022KF09)。

摘　　要：稀疏奖励是多智能体强化学习的主要挑战之一,现有算法难以在稀疏奖励场景下有效训练智能体团队,并且容易导致其探索效率低下。为解决此类问题,本文提出基于适应性状态近似的多智能体强化学习算法。受人类在奖励稀缺情况下学习的启发,通过考虑智能体状态之间的相似性,自适应地从经验池中获取近似状态,并将其添加到候选状态集,利用候选状态集中的探索信息促进策略训练。此外,算法还将该近似状态与当前局部状态之间的距离作为智能体的内在奖励,引导智能体在最大化联合状态动作值的同时更有效地探索未知环境,快速找到最优策略。实验结果表明,本文算法在不同稀疏奖励程度的多智能体追捕场景中表现优于现有强化学习方法,具备鲁棒性和有效性,能够加快智能体的学习速度。Sparse reward is one of the main challenges in multi-agent reinforcement learning,and existing algorithms are difficult to effectively train agent teams in sparse reward scenarios,resulting in low exploration efficiency.Inspired by human learning in reward scarcity situations,a multi-agent reinforcement learning algorithm based on adaptive state approximation(MAASA)is proposed to solve this problem.By considering the similarity among the agent states,the algorithm automatically obtains approximate states from the replay buffer to fill the candidate state set,and uses the exploration information in the candidate state set to promote the training of strategies.In addition,MAASA algorithm also uses the distance between the approximate state and the current local state as an intrinsic reward for the agent,to guide the agent to explore the unknown environment more effectively while maximizing the joint state-action value,and quickly find the optimal strategy.The experimental results show that the algorithm performs better than existing reinforcement learning methods in multi-agent predator-prey environment scenarios with different sparsities,demonstrate robustness and effectiveness,and can accelerate the learning speed of the agents.

关键词：强化学习稀疏奖励状态相似内在奖励

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

稀疏奖励场景下基于适应性状态近似的多智能体强化学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

稀疏奖励场景下基于适应性状态近似的多智能体强化学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索