一种针对德州扑克AI的对手建模与策略集成框架  被引量:6

An Opponent Modeling and Strategy Integration Framework for Texas Hold'em

在线阅读下载全文

作  者:张蒙[1,2] 李凯 吴哲 臧一凡[1,2] 徐航 兴军亮 ZHANG Meng;LI Kai;WU Zhe;ZANG Yi-Fan;XU Hang;XING Jun-Liang(Institute of Automation,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区:[1]中国科学院自动化研究所,北京100190 [2]中国科学院大学,北京100049

出  处:《自动化学报》2022年第4期1004-1017,共14页Acta Automatica Sinica

基  金:国家自然科学基金(62076238,61902402);国家重点研发计划(2020AAA0103401);中国科学院战略性先导研究项目(XDA27000000);CCF-腾讯犀牛鸟基金(RAGR20200104)资助。

摘  要:以德州扑克游戏为代表的大规模不完美信息博弈是现实世界中常见的一种博弈类型.现有以求解纳什均衡策略为目标的主流德州扑克求解算法存在依赖博弈树模型、算力消耗大、策略过于保守等问题,导致智能体在面对不同对手时无法最大化自身收益.为解决上述问题,提出一种轻量高效且能快速适应对手策略变化进而剥削对手的不完美信息博弈求解框架.本框架分为智能体离线训练和在线博弈两个阶段.第1阶段基于演化学习思想训练智能体,得到能够剥削不同博弈风格对手的策略神经网络.在第2博弈阶段中,智能体在线建模并适应未知风格对手,利用种群策略集成的方法最大化剥削对手.在两人无限注德州扑克环境中的实验结果表明,本框架在面对动态对手策略时,相比已有方法能够大幅提升博弈性能.Texas Hold’em is a typical large-scale imperfect information game in the real world. Existing algorithms computing Nash equilibriums in the Texas Hold’em have severe problems, including the heavy dependency on the game’s abstract model, the considerable resource consumption, and the learned strategy’s conservatism prevents it from maximizing the payoffs when facing different opponents. To alleviate these problems, we propose a lightweight and efficient framework for imperfect information that can quickly adapt to new opponents/strategies. It consists of two stages: The offline training stage and the online game stage. Based on the evolutionary theory, we train policy networks to exploit opponents with distinct styles in the training stage. While during the game stage,the agent first models the unknown opponent and then weighs the trained policies to integrate an adaptive strategy,which maximizes the exploitation of the opponent. Experimental results in heads-up no-limit Texas Hold’em show the superiority of the proposed framework. Strategy obtained by this framework significantly outperforms the existing methods when facing dynamic opponents.

关 键 词:不完美信息博弈 德州扑克 演化学习 在线对手建模 种群策略集成 

分 类 号:O225[理学—运筹学与控制论] TP18[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象