检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李渊 徐新海 Li Yuan;Xu Xinhai(Academy of Military Sciences,Beijing 100190,China)
机构地区:[1]军事科学院,北京100190
出 处:《计算机应用研究》2022年第3期802-806,共5页Application Research of Computers
基 金:国家青年科学基金资助项目。
摘 要:多智能体强化学习方法在仿真模拟、游戏对抗、推荐系统等许多方面取得了突出的进展。然而,现实世界的复杂问题使得强化学习方法存在无效探索多、训练速度慢、学习能力难以持续提升等问题。该研究嵌入规则的多智能体强化学习技术,提出基于组合训练的规则与学习结合的方式,分别设计融合规则的多智能体强化学习模型与规则选择模型,通过组合训练将两者有机结合,能够根据当前态势决定使用强化学习决策还是使用规则决策,有效解决在学习中使用哪些规则以及规则使用时机的问题。依托中国电子科技集团发布的多智能体对抗平台,对提出的方法进行实验分析和验证。通过与内置对手对抗,嵌入规则的方法经过约1.4万局训练就收敛到60%的胜率,而没有嵌入规则的算法需要约1.7万局的时候收敛到50%的胜率,结果表明嵌入规则的方法能够有效提升学习的收敛速度和最终效果。Multi-agent reinforcement learning methods have been made great progress in simulation, game, recommendation systems and so on. However, the complex problems in the real word bring great difficulties for reinforcement learning, such as many useless explorations, slow converging speed and poor performance of the learning. This paper studied the problem of multi-agent reinforcement learning with embedded rules and proposed a method to combine rules and the learning method based on an iterative training mechanism. This method designed a multi-agent reinforcement learning method with embedded rules, and a rule selection model. This paper introduced an iterative training mechanism to combine the two methods together. The proposed method could decide whether to use the result of a reinforcement learning or the result of a rule based on the real-time game state. It could effectively solve the problem that which rule should be selected and when it would be used. Finally, it made an experiment on a multi-agent combat platform which was published by the China Electronics Technology Group. By fighting with the built-in opponent in the platform, it found that the method with rules could achieve 60% win rate with 14 thousand rounds while achieve 50% win rate with 17 thousand rounds for the method without rules. The results show that the proposed method can effectively improve the converging speed and the performance of multi-agent reinforcement learning.
关 键 词:多智能体强化学习 嵌入规则 规则选择模型 组合训练
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33