非完美信息博弈综述:对抗求解方法与对比分析  

A Review of Imperfect Information Games:Adversarial Solving Methods and Comparative Analysis

在线阅读下载全文

作  者:余超[1] 刘宗凯 胡超豪 黄凯奇[2] 张俊格[2] YU Chao;LIU Zong-Kai;HU Chao-Hao;HUANG Kai-Qi;ZHANG Jun-Ge(School of Computer Science And Engineering,Sun Yat-Sen University,GuangZhou 510000;Center for Research Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190)

机构地区:[1]中山大学计算机学院,广州510006 [2]中国科学院自动化研究所智能系统与工程研究中心,北京100190

出  处:《计算机学报》2024年第9期2211-2246,共36页Chinese Journal of Computers

基  金:国家自然科学基金面上项目(No.62076259);广东省自然科学基金(No.2023A1515012946);中国科学院基础培育基金项目(JCPYJJ-22017);中山大学中央高校基本科研业务费专项资金;中国科学院青年促进会项目资助.

摘  要:当前,人工智能成为经济发展的新引擎,是新一轮产业变革的核心驱动力.结合人工智能与博弈论形成的新兴研究领域“博弈智能”吸引了越来越多学者的研究兴趣,并在现实生活中得到了广泛应用.作为一类典型的博弈智能,非完美信息博弈通过建模多智能体在私有信息下的博弈行为,能够刻画相较完美信息博弈更广泛的决策过程,在现实世界中具有广泛应用,例如金融贸易、商业谈判、军事对抗等.近年来,非完美信息博弈求解研究取得了突破性进展,涌现出以遗憾最小化(Regret Minimization)和最佳响应(Best Response)为核心技术的两大类离线求解方法.前者通过反省智能体过往决策以使自身策略向均衡点改进,成功解决了以德州扑克为代表的经典非完美信息博弈.后者通过特定应对方式针对对手决策以使自身策略向均衡点改进,在例如星际争霸、DOTA等大型实时战略游戏AI训练中发挥着关键作用.此外,一系列在线求解方法能够进一步实时优化离线算法求解所得的蓝图策略,使其在实时对局中得到进一步改进,成为求解非完美信息博弈的关键技术.本文将从非完美信息博弈的概念和特点切入,全面介绍这三类方法的基本原理、发展脉络和改进技巧,深入对比不同方法间的优缺点并展望未来研究方向.希望通过对非完美信息博弈求解这一研究领域的全方位细致梳理,能够进一步推动博弈智能技术向前发展,为迈向通用人工智能赋能.Artificial Intelligence(AI)has emerged as a pivotal force in the latest industrial revolution and has become a national strategic priority.The fusion of AI and game theory has given rise to“Game Intelligence”as a leading research domain.Among the diverse facets of game intelligence,Imperfect-Information Games(IIGs)stand out for their ability to simulate the strategic decision-making of multiple agents amidst private information an accurate portrayal of many realworld scenarios.Compared to perfect-information games,IIGs offer a more nuanced understanding of decision-making processes,making them applicable across various real-world domains such as financial trading,business negotiations,and military operations.Recent strides in IIG research have led to the emergence of two primary streams of offline solving methods:Regret Minimization and Best Response.Regret Minimization continually refines its strategy towards equilibrium by learning from past decisions,making it particularly advantageous in scenarios with unknown or uncertain opponent strategies.On the other hand,Best Response fine-tunes its strategy towards equilibrium by devising tailored countermeasures against opponents’decisions,proving pivotal in training AI for large-scale real-time strategy games like Starcraft and DOTA.The efficacy of the Best Response approach hinges on its ability to anticipate and counteract opponents’moves.Moreover,search-based online solving methods optimize blueprint strategies in realtime,facilitating precise Nash equilibrium solutions,constituting a critical technology in IIG solving.The synergy of offline and online solving methods equips AI with the capability to navigate the intricacies of IIGs and attain optimal solutions.This survey aims to provide a comprehensive exploration of the realm of IIGs.Beginning with an elucidation of IIGs’concept and their distinguishing features,the survey offers an overview of the methods employed for their resolution.Subsequently,it delves into the fundamental principles and histori

关 键 词:非完美信息博弈 遗憾最小化 最佳响应 在线求解 强化学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象