基于组合训练的规则嵌入多智能体强化学习方法被引量：3

Embedding rules into multiagent reinforcement learning based on iterative training

作　　者：李渊徐新海 Li Yuan;Xu Xinhai(Academy of Military Sciences,Beijing 100190,China)

出　　处：《计算机应用研究》2022年第3期802-806,共5页Application Research of Computers

基　　金：国家青年科学基金资助项目。

摘　　要：多智能体强化学习方法在仿真模拟、游戏对抗、推荐系统等许多方面取得了突出的进展。然而,现实世界的复杂问题使得强化学习方法存在无效探索多、训练速度慢、学习能力难以持续提升等问题。该研究嵌入规则的多智能体强化学习技术,提出基于组合训练的规则与学习结合的方式,分别设计融合规则的多智能体强化学习模型与规则选择模型,通过组合训练将两者有机结合,能够根据当前态势决定使用强化学习决策还是使用规则决策,有效解决在学习中使用哪些规则以及规则使用时机的问题。依托中国电子科技集团发布的多智能体对抗平台,对提出的方法进行实验分析和验证。通过与内置对手对抗,嵌入规则的方法经过约1.4万局训练就收敛到60%的胜率,而没有嵌入规则的算法需要约1.7万局的时候收敛到50%的胜率,结果表明嵌入规则的方法能够有效提升学习的收敛速度和最终效果。Multi-agent reinforcement learning methods have been made great progress in simulation, game, recommendation systems and so on. However, the complex problems in the real word bring great difficulties for reinforcement learning, such as many useless explorations, slow converging speed and poor performance of the learning. This paper studied the problem of multi-agent reinforcement learning with embedded rules and proposed a method to combine rules and the learning method based on an iterative training mechanism. This method designed a multi-agent reinforcement learning method with embedded rules, and a rule selection model. This paper introduced an iterative training mechanism to combine the two methods together. The proposed method could decide whether to use the result of a reinforcement learning or the result of a rule based on the real-time game state. It could effectively solve the problem that which rule should be selected and when it would be used. Finally, it made an experiment on a multi-agent combat platform which was published by the China Electronics Technology Group. By fighting with the built-in opponent in the platform, it found that the method with rules could achieve 60% win rate with 14 thousand rounds while achieve 50% win rate with 17 thousand rounds for the method without rules. The results show that the proposed method can effectively improve the converging speed and the performance of multi-agent reinforcement learning.

关键词：多智能体强化学习嵌入规则规则选择模型组合训练

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于组合训练的规则嵌入多智能体强化学习方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于组合训练的规则嵌入多智能体强化学习方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于组合训练的规则嵌入多智能体强化学习方法被引量：3