检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yan Song He Jiang Zheng Tian Haifeng Zhang Yingping Zhang Jiangcheng Zhu Zonghong Dai Weinan Zhang Jun Wang
机构地区:[1]Institute of Automation,Chinese Academy of Sciences,Beijing,100190,China [2]Digital Brain Lab,Shanghai,200001,China [3]ShanghaiTech University,Shanghai,200001,China [4]Huawei Cloud,Guiyang,550003,China [5]Shanghai Jiao Tong University,Shanghai,200001,China [6]University College London,London,WC1E 6PT,UK
出 处:《Machine Intelligence Research》2024年第3期549-570,共22页机器智能研究(英文版)
基 金:supported by the National Natural Science Foundation of China(No.62206289).
摘 要:Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.
关 键 词:Multi-agent reinforcement learning(RL) distributed RL system population-based training reward shaping game theory
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.58.157.160