检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Kun Jiang Wenzhang Liu Yuanda Wang Lu Dong Changyin Sun
机构地区:[1]the School of Automation,Southeast University,Nanjing 210096,China [2]the School of Artificial Intelligence,Anhui University,Hefei 230601,China [3]IEEE [4]the School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China [5]the Engineering Research Center of Autonomous Unmanned System Technology,Ministry of Education,Hefei 230601,China
出 处:《IEEE/CAA Journal of Automatica Sinica》2024年第7期1591-1604,共14页自动化学报(英文版)
基 金:supported in part by the National Natural Science Foundation of China (62136008,62236002,61921004,62173251,62103104);the “Zhishan” Scholars Programs of Southeast University;the Fundamental Research Funds for the Central Universities (2242023K30034)。
摘 要:Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.
关 键 词:Latent variable model maximum entropy multi-agent reinforcement learning(MARL) multi-agent system
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7