Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

作　　者：Zaipeng Xie Cheng Ji Chentai Qiao WenZhan Song Zewen Li Yufeng Zhang Yujing Zhang

机构地区：[1]Key Laboratory of Water Big Data Technology of Ministry of Water Resources,Hohai University,Nanjing,China [2]College of Computer and Information,Hohai University,Nanjing,China [3]Center for Cyber‐Physical Systems,University of Georgia,Athens,Georgia,USA [4]Information Networking Institute,Carnegie Mellon University,Pittsburgh,Pennsylvania,USA [5]Department of Electrical and Systems Engineering,University of Pennsylvania,Philadelphia,Pennsylvania,USA

出　　处：《CAAI Transactions on Intelligence Technology》2024年第4期1014-1030,共17页智能技术学报（英文）

基　　金：National Natural Science Foundation of China,Grant/Award Number:61872171;The Belt and Road Special Foundation of the State Key Laboratory of Hydrology‐Water Resources and Hydraulic Engineering,Grant/Award Number:2021490811。

摘　　要：Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.

关键词：artificial intelligence techniques decision making intelligent multi‐agent systems

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索