深度强化学习中状态注意力机制的研究被引量：15

State attention in deep reinforcement learning

作　　者：申翔翔侯新文[2] 尹传环[1] SHEN Xiangxiang;HOU Xinwen;YIN Chuanhuan(Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China;Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 110016,China)

机构地区：[1]北京交通大学交通数据分析与挖掘北京市重点实验室,北京100044 [2]中国科学院自动化研究所智能系统与工程研究中心,北京110016

出　　处：《智能系统学报》2020年第2期317-322,共6页CAAI Transactions on Intelligent Systems

基　　金：中央高校基本科研业务费专项资金项目(2018JBZ006);国家自然科学基金项目(61105056)。

摘　　要：虽然在深度学习与强化学习结合后,人工智能在棋类游戏和视频游戏等领域取得了超越人类水平的重大成就,但是实时策略性游戏星际争霸由于其巨大的状态空间和动作空间,对于人工智能研究者来说是一个巨大的挑战平台,针对Deepmind在星际争霸Ⅱ迷你游戏中利用经典的深度强化学习算法A3C训练出来的基线智能体的水平和普通业余玩家的水平相比还存在较大的差距的问题。通过采用更简化的网络结构以及把注意力机制与强化学习中的奖励结合起来的方法,提出基于状态注意力的A3C算法,所训练出来的智能体在个别星际迷你游戏中利用更少的特征图层取得的成绩最高,高于Deepmind的基线智能体71分。Through artificial intelligence, significant achievements beyond the human level have been made in the field of board games and video games since the emergence of deep reinforcement learning. However, the real-time strategic game StarCraft is a huge challenging platform for artificial intelligence researchers due to its huge state space and action space. Considering that the level of baseline agents trained by DeepMind using classical deep reinforcement learning algorithm A3C in StarCraft Ⅱ mini-game is still far from that of ordinary amateur players, by adopting a more simplified network structure and combining the attention mechanism with rewards in reinforcement learning, an A3C algorithm based on state attention is proposed to solve this problem. The trained agent achieves the highest score, which is 71 points higher than Deepmind’s baseline agent in individual interplanetary mini games with fewer feature layers.

关键词：深度学习强化学习注意力机制 A3C算法星际争霸Ⅱ迷你游戏智能体微型操作

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习中状态注意力机制的研究被引量：15

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习中状态注意力机制的研究 被引量：15

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

深度强化学习中状态注意力机制的研究被引量：15