检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Lei YUAN Feng CHEN Zongzhang ZHANG Yang YU
机构地区:[1]National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China [2]Polixir Technologies,Nanjing 211106,China
出 处:《Frontiers of Computer Science》2024年第6期101-117,共17页计算机科学前沿(英文版)
基 金:the National Key R&D Program of China(2020AAA0107200);the National Natural Science Foundation of China(Grant Nos.61921006,61876119,62276126);the Natural Science Foundation of Jiangsu(BK20221442)。
摘 要:Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning(MARL).Nowadays,existing works mainly focus on improving the communication efficiency of agents,neglecting that real-world communication is much more challenging as there may exist noise or potential attackers.Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration.In this paper,we posit that the ego system trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication,dubbed MA3C,to obtain a robust communication-based policy.In specific,we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system,with which every information channel may suffer from distinct message attacks.Furthermore,as naive adversarial training may impede the generalization ability of the ego system,we design an attacker population generation approach based on evolutionary learning.Finally,the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness,meaning that both the ego system and the attackers are adaptable.Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.
关 键 词:multi-agent communication adversarial training robustness validation reinforcement learning
分 类 号:TN914[电子电信—通信与信息系统] TP18[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.121.190