融合策略价值网络的高效棋类游戏算法

Efficient Board Games Algorithm with Integrated Strategy Value Network

作　　者：周毅田永谌邱宇峰高华 ZHOU Yi;TIAN Yongshen;QIU Yufeng;GAO Hua(School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081,China;Baosight Software(Wuhan)Co.,Ltd.,Wuhan 430080,China)

机构地区：[1]武汉科技大学信息科学与工程学院,湖北武汉430081 [2]宝信软件(武汉)有限公司,湖北武汉430080

出　　处：《计算机与现代化》2025年第1期86-93,共8页Computer and Modernization

基　　金：国家自然科学基金资助项目(62372343)。

摘　　要：棋类游戏一直是深度强化学习的研究热点,因为棋盘和棋类游戏规则具备较高复杂度,寻求棋类游戏的最优解需要耗费大量时间。现阶段的棋类游戏算法采用的基于动作概率分布的动作选择方法导致自我对弈效率低下,且策略和价值需要独立的神经网络计算,样本利用率低、训练耗时长。本文就上述问题提出一种融合策略价值网络的高效棋类游戏算法,以耿贝尔最大值方法替代原本的动作选择方法,且采用ε-greedy算法和模拟退火算法平衡动作搜索中探索与利用的关系。实验结果表明:相较于各种经典棋类游戏算法,本文提出的算法在对战传统算法时胜率达到90%以上。在蒙特卡洛模拟次数较小的情况下,引入耿贝尔最大值采样训练得到的模型的埃洛等级分远高于传统动作选择方法。在训练达到3000埃洛等级分的前提下,本文提出的算法能节约50%的时间。Board games always have been a focus of deep reinforcement learning research due to their complex board configurations and rules,which require a lot of time to find optimal solutions.Current algorithms for chess games use action probability distribution-based methods for action selection during self-play,which leads to inefficient exploration and exploitation.They also require separate neural network computations for strategy and value,resulting in low sample usage and long training times.This paper proposes an efficient chess game algorithm that combines strategy-value networks,replacing the original action selection method with the Geng-Bellman maximum value method.It balances exploration and exploitation in action search usingε-greedy and simulated annealing algorithms.Experimental results show that compared to various classical chess game algorithms,the proposed algorithm achieves a win rate of over 90%against traditional algorithms.Moreover,using Gumbel-max method during training leads to significantly higher Elo ratings compared to traditional action selection methods with low Monte Carlo simulation counts.With training reaching 3000 Elo ratings,the proposed algorithm can save 50%of the time.

关键词：棋类游戏蒙特卡洛树搜索耿贝尔最大值方法 ε-greedy算法模拟退火算法

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合策略价值网络的高效棋类游戏算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合策略价值网络的高效棋类游戏算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索