检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴哲[1,2] 李凯 徐航 兴军亮 WU Zhe;LI Kai;XU Hang;XING Jun-Liang(Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049;Department of Computer Science and Technology,Tsinghua University,Beijing 100084)
机构地区:[1]中国科学院自动化研究所智能系统与工程研究中心,北京100190 [2]中国科学院大学人工智能学院,北京100049 [3]清华大学计算机科学与技术系,北京100084
出 处:《自动化学报》2022年第10期2462-2473,共12页Acta Automatica Sinica
基 金:国家重点研发计划(2020AAA0103401);国家自然科学基金(62076238,61902402);中国科学院战略性先导研究项目(XDA27000000);CCF-腾讯犀牛鸟基金(RAGR20200104)资助。
摘 要:围绕两人零和博弈所开展的一系列研究,近年来在围棋、德州扑克等问题中取得了里程碑式的突破.现有的两人零和博弈求解方案大多在理性对手的假设下围绕纳什均衡解开展,是一种力求不败的保守型策略,但在实际博弈中由于对手非理性等原因并不能保证收益最大化.对手建模为最大化博弈收益提供了一种新途径,但仍存在建模困难等问题.结合元学习的思想提出了一种能够快速适应对手策略的元策略演化学习求解框架.在训练阶段,首先通过种群演化的方法不断生成风格多样化的博弈对手作为训练数据,然后利用元策略更新方法来调整元模型的网络权重,使其获得快速适应的能力.在Leduc扑克、两人有限注德州扑克(Heads-up limit Texas Hold’em, LHE)和RoboSumo上的大量实验结果表明,该算法能够有效克服现有方法的弊端,实现针对未知风格对手的快速适应,从而为两人零和博弈收益最大化求解提供了一种新思路.Recently, two-player zero-sum games have made impressive breakthroughs in the Go and Texas Hold’em. Most of the existing two-player zero-sum game solutions are based on the assumption of rational opponents to approximate the Nash equilibrium solutions, which is a conservative strategy of trying to be undefeated but does not guarantee maximum payoffs in practice due to the opponents’ irrationality. The opponent modeling provides a new way to maximize the payoff, but modeling has difficulties. This paper proposes a meta-evolutionary learning framework that can quickly adapt to the opponents. In the training phase, we first generate opponents with different styles as training data through the population evolution method, and then use the meta-strategy update method to adjust the network weights of the meta-model so that it can gain the ability to adapt quickly. Extensive experiments on Leduc poker, heads-up limit Texas Hold’em(LHE), and RoboSumo have shown that the algorithm can effectively overcome the drawbacks of existing methods and achieve fast adaptation to unknown style of opponents, thus providing a new way of solving two-player zero-sum games with maximum payoff.
关 键 词:两人零和博弈 纳什均衡 对手建模 元学习 种群演化
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] O225[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.247.210