检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:梁荣钦 朱圆恒 赵冬斌[1,2] LIANG Rong-qin;ZHU Yuan-hengy;ZHAO Dong-bin(School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China;The State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]中国科学院大学,人工智能学院,北京100049 [2]中国科学院自动化研究所,多模态人工智能系统全国重点实验室,北京100190
出 处:《控制理论与应用》2025年第2期226-234,共9页Control Theory & Applications
基 金:科技创新2030“新一代人工智能”重大项目(2018AAA0102404);中国科学院战略性先导研究项目(XDA27030400);国家自然科学基金项目(62293541,62136008);中国科学院青年创新促进会项目(2021132)资助.
摘 要:双人游戏在游戏人工智能领域是一个基本且重要的问题,其中一对一零和格斗游戏是最为典型的双人游戏之一.本文基于深度强化学习对格斗游戏博弈对抗策略进行研究.首先建模格斗游戏环境,设计可用于格斗游戏决策的状态、动作以及奖赏函数,并将阶段策略梯度算法应用于对抗策略的学习.为了尽可能学到纳什均衡策略实现战胜任意对手的目标,本文设计了基于历年参赛的智能体构造对手池用于智能体训练,并探索对手选择机制对于训练过程的影响.最后在固定对手池的基础上,设计了自增长对手池算法,以提升对手策略的完备性和训练智能体的鲁棒性.为了提高环境采样速度,本文从传统并行框架出发,设计了可用于双人游戏的多服务器分布式并行采样框架.通过实验对比发现,基于自增长对手池方法所学的智能体能以96.6%的胜率击败固定对手池中的智能体,并且在与3个仅用于测试的智能体对战时,也表现出了72.2%的胜率.In the realm of gaming artificial intelligence,two-player games represent a fundamental and crucial issue,with one-on-one zero-sum fighting games standing as one of the most quintessential forms of two-player games.In this paper,we explore adversarial strategies for fighting games based on deep reinforcement learning.We begin by constructing a model of the fighting game environment,formulating the states,actions,and reward functions that are applicable to decision-making within these games.We then employ phasic policy gradient algorithms for the learning of adversarial strategies.In pursuit of mastering Nash equilibrium strategies to triumph over any opponent,we construct an opponent pool based on intelligent agents from previous competitions for the purpose of training.We also investigate the impact of opponent selection mechanisms on the training process.Lastly,building on a fixed opponent pool,we devise a selfexpanding opponent pool algorithm to enhance the comprehensiveness of the opponent strategies and bolster the robustness of the trained agents.To expedite the process of environment sampling,we leverage conventional parallel architectures and create a distributed,multi-server parallel sampling scheme optimized for two-player games.Experimental comparisons reveal that agents trained using the self-expanding opponent pool method achieve a 96.6%win rate against agents in the fixed opponent pool.Furthermore,they also exhibit a 72.2%win rate when pitted against three agents used solely for testing purposes.
关 键 词:实时格斗游戏 深度强化学习 两人零和博弈 对手策略池
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15