检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:方宝富[1] 余婷婷 王浩[1] 王在俊[2] FANG Baofu;YU Tingting;WANG Hao;WANG Zaijun(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China;Key Laboratory of Flight Techniques and Flight Safety,Civil Aviation Flight University of China,Guanghan 618307,China)
机构地区:[1]合肥工业大学计算机与信息学院,安徽合肥230601 [2]中国民用航空飞行学院民航飞行技术与飞行安全重点实验室,四川广汉618307
出 处:《机器人》2024年第6期663-671,682,共10页Robot
基 金:国家自然科学基金(61872327);安徽省自然科学基金(2308085MF203);安徽高校协同创新项目(GXXT-2022-055);民航飞行技术与飞行安全重点实验室开放基金(FZ2022KF09)。
摘 要:稀疏奖励是多智能体强化学习的主要挑战之一,现有算法难以在稀疏奖励场景下有效训练智能体团队,并且容易导致其探索效率低下。为解决此类问题,本文提出基于适应性状态近似的多智能体强化学习算法。受人类在奖励稀缺情况下学习的启发,通过考虑智能体状态之间的相似性,自适应地从经验池中获取近似状态,并将其添加到候选状态集,利用候选状态集中的探索信息促进策略训练。此外,算法还将该近似状态与当前局部状态之间的距离作为智能体的内在奖励,引导智能体在最大化联合状态动作值的同时更有效地探索未知环境,快速找到最优策略。实验结果表明,本文算法在不同稀疏奖励程度的多智能体追捕场景中表现优于现有强化学习方法,具备鲁棒性和有效性,能够加快智能体的学习速度。Sparse reward is one of the main challenges in multi-agent reinforcement learning,and existing algorithms are difficult to effectively train agent teams in sparse reward scenarios,resulting in low exploration efficiency.Inspired by human learning in reward scarcity situations,a multi-agent reinforcement learning algorithm based on adaptive state approximation(MAASA)is proposed to solve this problem.By considering the similarity among the agent states,the algorithm automatically obtains approximate states from the replay buffer to fill the candidate state set,and uses the exploration information in the candidate state set to promote the training of strategies.In addition,MAASA algorithm also uses the distance between the approximate state and the current local state as an intrinsic reward for the agent,to guide the agent to explore the unknown environment more effectively while maximizing the joint state-action value,and quickly find the optimal strategy.The experimental results show that the algorithm performs better than existing reinforcement learning methods in multi-agent predator-prey environment scenarios with different sparsities,demonstrate robustness and effectiveness,and can accelerate the learning speed of the agents.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.127