对手类型未知情况下的两人零和马尔科夫博弈决策

Decision making for two-player zero-sum Markov games with indistinguishable opponents

作　　者：王成意朱进赵云波 WANG Cheng-yi;ZHU Jin;ZHAO Yun-bo(School of Information Science and Technology,University of Science and Technology of China,Hefei Anhui 230026,China)

机构地区：[1]中国科学技术大学信息科学技术学院,安徽合肥230026

出　　处：《控制理论与应用》2024年第11期2131-2138,共8页Control Theory & Applications

基　　金：国家重点研发计划项目(2018AAA0100802);安徽省自然科学基金项目(2008085MF198)资助.

摘　　要：本文研究一类典型的非完全信息博弈问题—-对手类型未知的两人零和马尔科夫博弈,其中对手类型多样且每次博弈开始前无法得知对手类型.文中提出了一种基于模型的多智能体强化学习算法—-对手辨识的极大极小Q学习(DOMQ).该算法首先建立对手相关环境的经验模型,再使用经验模型学习纳什均衡策略,己方智能体在实际博弈中根据经验模型判断对手类型,从而使用相应的纳什均衡策略,以保证收益下限.本文所提的DOMQ算法只需要在采样阶段的每轮博弈结束后得知对手的类型,除此之外无需知道任何环境的信息.仿真实验验证了所提算法的有效性.This paper investigates a typical class of incomplete information games-two-player zero-sum Markov games with indistinguishable opponents,where the opponent types are diverse and cannot be known at the beginning of the game.We propose a model-based multi-agent reinforcement learning algorithm-distinguishing opponent minimax Qlearning(DOMQ).The algorithm firstly builds an empirical model of the opponent-related environment;secondly uses the empirical model to learn a Nash equilibrium strategy,and then uses the corresponding Nash equilibrium strategy to guarantee the lower bound of the return in actual game.All the necessary information needed for the proposed DOMQ algorithm is the opponent type at the end of each episode in the sampling period rather than the other information about the environment.The simulation results verify the effectiveness of the proposed algorithm.

关键词：两人零和马尔科夫博弈非完全信息极大极小Q学习纳什均衡多智能体强化学习

分类号：TP18[自动化与计算机技术—控制理论与控制工程] O225[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

对手类型未知情况下的两人零和马尔科夫博弈决策

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

对手类型未知情况下的两人零和马尔科夫博弈决策

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索