一种用于两人零和博弈对手适应的元策略演化学习算法被引量：1

A Meta-evolutionary Learning Algorithm for Opponent Adaptation in Two-player Zero-sum Games

作　　者：吴哲[1,2] 李凯徐航兴军亮 WU Zhe;LI Kai;XU Hang;XING Jun-Liang(Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049;Department of Computer Science and Technology,Tsinghua University,Beijing 100084)

机构地区：[1]中国科学院自动化研究所智能系统与工程研究中心,北京100190 [2]中国科学院大学人工智能学院,北京100049 [3]清华大学计算机科学与技术系,北京100084

出　　处：《自动化学报》2022年第10期2462-2473,共12页Acta Automatica Sinica

基　　金：国家重点研发计划(2020AAA0103401);国家自然科学基金(62076238,61902402);中国科学院战略性先导研究项目(XDA27000000);CCF-腾讯犀牛鸟基金(RAGR20200104)资助。

摘　　要：围绕两人零和博弈所开展的一系列研究,近年来在围棋、德州扑克等问题中取得了里程碑式的突破.现有的两人零和博弈求解方案大多在理性对手的假设下围绕纳什均衡解开展,是一种力求不败的保守型策略,但在实际博弈中由于对手非理性等原因并不能保证收益最大化.对手建模为最大化博弈收益提供了一种新途径,但仍存在建模困难等问题.结合元学习的思想提出了一种能够快速适应对手策略的元策略演化学习求解框架.在训练阶段,首先通过种群演化的方法不断生成风格多样化的博弈对手作为训练数据,然后利用元策略更新方法来调整元模型的网络权重,使其获得快速适应的能力.在Leduc扑克、两人有限注德州扑克(Heads-up limit Texas Hold’em, LHE)和RoboSumo上的大量实验结果表明,该算法能够有效克服现有方法的弊端,实现针对未知风格对手的快速适应,从而为两人零和博弈收益最大化求解提供了一种新思路.Recently, two-player zero-sum games have made impressive breakthroughs in the Go and Texas Hold’em. Most of the existing two-player zero-sum game solutions are based on the assumption of rational opponents to approximate the Nash equilibrium solutions, which is a conservative strategy of trying to be undefeated but does not guarantee maximum payoffs in practice due to the opponents’ irrationality. The opponent modeling provides a new way to maximize the payoff, but modeling has difficulties. This paper proposes a meta-evolutionary learning framework that can quickly adapt to the opponents. In the training phase, we first generate opponents with different styles as training data through the population evolution method, and then use the meta-strategy update method to adjust the network weights of the meta-model so that it can gain the ability to adapt quickly. Extensive experiments on Leduc poker, heads-up limit Texas Hold’em(LHE), and RoboSumo have shown that the algorithm can effectively overcome the drawbacks of existing methods and achieve fast adaptation to unknown style of opponents, thus providing a new way of solving two-player zero-sum games with maximum payoff.

关键词：两人零和博弈纳什均衡对手建模元学习种群演化

分类号：TP18[自动化与计算机技术—控制理论与控制工程] O225[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种用于两人零和博弈对手适应的元策略演化学习算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种用于两人零和博弈对手适应的元策略演化学习算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种用于两人零和博弈对手适应的元策略演化学习算法被引量：1