基于对手池的两人格斗游戏深度强化学习

Deep reinforcement learning for two-player fighting game based on opponent pool

作　　者：梁荣钦朱圆恒赵冬斌[1,2] LIANG Rong-qin;ZHU Yuan-hengy;ZHAO Dong-bin(School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China;The State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]中国科学院大学,人工智能学院,北京100049 [2]中国科学院自动化研究所,多模态人工智能系统全国重点实验室,北京100190

出　　处：《控制理论与应用》2025年第2期226-234,共9页Control Theory & Applications

基　　金：科技创新2030“新一代人工智能”重大项目(2018AAA0102404);中国科学院战略性先导研究项目(XDA27030400);国家自然科学基金项目(62293541,62136008);中国科学院青年创新促进会项目(2021132)资助.

摘　　要：双人游戏在游戏人工智能领域是一个基本且重要的问题,其中一对一零和格斗游戏是最为典型的双人游戏之一.本文基于深度强化学习对格斗游戏博弈对抗策略进行研究.首先建模格斗游戏环境,设计可用于格斗游戏决策的状态、动作以及奖赏函数,并将阶段策略梯度算法应用于对抗策略的学习.为了尽可能学到纳什均衡策略实现战胜任意对手的目标,本文设计了基于历年参赛的智能体构造对手池用于智能体训练,并探索对手选择机制对于训练过程的影响.最后在固定对手池的基础上,设计了自增长对手池算法,以提升对手策略的完备性和训练智能体的鲁棒性.为了提高环境采样速度,本文从传统并行框架出发,设计了可用于双人游戏的多服务器分布式并行采样框架.通过实验对比发现,基于自增长对手池方法所学的智能体能以96.6%的胜率击败固定对手池中的智能体,并且在与3个仅用于测试的智能体对战时,也表现出了72.2%的胜率.In the realm of gaming artificial intelligence,two-player games represent a fundamental and crucial issue,with one-on-one zero-sum fighting games standing as one of the most quintessential forms of two-player games.In this paper,we explore adversarial strategies for fighting games based on deep reinforcement learning.We begin by constructing a model of the fighting game environment,formulating the states,actions,and reward functions that are applicable to decision-making within these games.We then employ phasic policy gradient algorithms for the learning of adversarial strategies.In pursuit of mastering Nash equilibrium strategies to triumph over any opponent,we construct an opponent pool based on intelligent agents from previous competitions for the purpose of training.We also investigate the impact of opponent selection mechanisms on the training process.Lastly,building on a fixed opponent pool,we devise a selfexpanding opponent pool algorithm to enhance the comprehensiveness of the opponent strategies and bolster the robustness of the trained agents.To expedite the process of environment sampling,we leverage conventional parallel architectures and create a distributed,multi-server parallel sampling scheme optimized for two-player games.Experimental comparisons reveal that agents trained using the self-expanding opponent pool method achieve a 96.6%win rate against agents in the fixed opponent pool.Furthermore,they also exhibit a 72.2%win rate when pitted against three agents used solely for testing purposes.

关键词：实时格斗游戏深度强化学习两人零和博弈对手策略池

分类号：TP3[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于对手池的两人格斗游戏深度强化学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于对手池的两人格斗游戏深度强化学习

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索