基于蒙特卡洛树搜索的通用博弈系统的构建与优化研究  被引量:3

Optimizing Monte Carlo Tree Search for General Game Playing

在线阅读下载全文

作  者:梁思立 姜桂飞 陈泰劼 邓益超 战瑀璠 张玉志 LIANG Sili;JIANG Guifei;CHEN Taijie;DENG Yichao;ZHAN Yufan;ZHANG Yuzhi(College of Software,Nankai University,Tianjin 300350,China;Department of Computer Science,University of Hong Kong,Hong Kong 999077,China;School of Computing,National University of Singapore,Singapore 117417,Singapore;School of Finance,Nankai University,Tianjin 300381,China)

机构地区:[1]南开大学,软件学院,天津300350 [2]香港大学,工程学院,中国香港特别行政区999077 [3]新加坡国立大学,计算机学院,新加坡117417 [4]南开大学,金融学院,天津300381

出  处:《数据与计算发展前沿》2022年第3期66-77,共12页Frontiers of Data & Computing

基  金:国家重点研发计划(2021YFB0300104);国家自然科学基金(61806102)。

摘  要:【背景】作为人工智能的主要研究领域,通用博弈策略(General Game Playing,简称GGP)旨在构建具有通用智能的博弈系统。这些系统能够基于给定的博弈规则在没有人为干涉的情况下成功地进行多个甚至是全新构造的博弈。【目的】与专门的博弈系统不同,通用博弈系统所使用的策略生成算法并不针对特定博弈,而是能够根据给定的博弈规则自动生成博弈策略的具有通用性的算法。GGP发展至今已成为检测人工智能水平,特别是通用智能发展的重要研究领域。如何构建高效的通用博弈系统是GGP研究的主要问题。【文献范围】通用博弈策略的生成算法是构建通用博弈系统的关键技术。目前所使用的主流算法是蒙特卡洛树搜索算法及其变种。这类算法在工作过程中并不依赖特定的博弈信息,因而被广泛地应用于GGP领域。然而,由博弈规则推导出来的关于博弈的专门信息,往往对建立针对这一博弈的有效决策算法具有重要的作用。【方法】为此,本文通过在蒙特卡洛树搜索算法上增加记忆结构来存储在线博弈过程中的实时信息,用记忆结构中博弈状态的相似状态来估计该状态的好坏,以提高状态评估的准确性。【结果】本文基于这一方法构建了通用博弈系统并对其性能进行了全面地评估。实验结果表明,与原始的蒙特卡洛方法相比,本文所构建的通用博弈系统在决策水平和效率上都有显著提升,特别在双人信息对称的零和回合制博弈中胜率保持在55%以上,且其性能随着博弈规模的增大而显著提升,在Connect 5、Breakthrough等大规模的游戏上有着绝对优势,即达到100%胜率。【结论】这表明本文所提出的方法通过利用博弈的专门信息能够有效地提升蒙特卡洛树搜索算法的性能。[Background]General Game Playing(GGP)is concerned with creating intelligent agents that understand the rules of previously unknown games and learn to play these games well without human intervention.[Objective]Unlike specialized systems,a general game player cannot rely on algorithms designed in advance for specific games.Such a system rather requires a form of general intelligence that enables it to autonomously generate strategies based on the given game rules.With the decade of development,GGP provides an important testbed for AI,especially artificial general intelligence.The main problem of GGP is how to build an efficient general game player.Strategy generation is the core technique for building a general game player.[Scope of the literature]The main algorithms used in previous successful general game players are the Monte Carlo Tree Search(MCTS)algorithm and its variants.[Methods]To improve MCTS during the online real-time search,this paper incorporates it with a memory structure,where each entry contains information about a particular state.This memory is used to generate an approximate value estimation by combining the estimations of similar states.[Results]Based on this method,we implement and evaluate a general game player.The experimental results show that it can outperform the original Monte Carlo player in a variety of games.Especially,in two-person zero-sum and turn-based games with symmetric information,the built general game player achieves a winning rate of more than 55%,and its performance improves significantly with the increase of the game size,even 100%winning rate in large-scale games such as Connect 5 and Breakthrough.[Conclusions]These results have confirmed the feasibility of the proposed method to use game-dependent information for improving the performance of MCTS.

关 键 词:通用博弈策略 蒙特卡洛树搜索 算法博弈论 多智能体系统 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象