一种深度强化学习空间关系与记忆融合方法研究

Research on a Fusion Method of Spatial Relationship and Memory in Deep Reinforcement Learning

作　　者：刘卉玲刘鹏[1] 白辰甲 LIU Hui-Ling;LIU Peng;Bai Chen-Jia(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150008;Shanghai AI Laboratory,Shanghai 200232)

机构地区：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150008 [2]上海人工智能实验室,上海200232

出　　处：《计算机学报》2023年第4期814-826,共13页Chinese Journal of Computers

基　　金：国家自然科学基金重点项目(No.51935005);基础科研项目(No.JCKY20200603C010);黑龙江省自然科学基金(No.LH2021F023)资助;黑龙江省科技计划项目(No.GA21C031)资助.

摘　　要：深度强化学习结合了深度学习在视觉上强大的感知能力来解决复杂环境的序列决策问题,但是由于采样效率低,对于复杂高维数据输入,学习其重要特征较为困难.为了从序列样本中更有效地提取信息,本文提出在深度强化学习中融合空间关系推理和记忆推理(Spatial Relationship Reasoning and Memory Reasoning,SRRMR)的模型结构.模型分为空间关系推理和记忆推理两部分,空间关系推理使用注意力机制作为空间关系学习方法隐式地推理任意两个实体间的关系,注意力机制中的查询向量融合了记忆推理的内容;记忆推理将输入图像的特征和关系作为记忆的输入,利用自注意力与记忆组成部分进行推理和交互,并将交互的结果存储在记忆单元中,使得记忆存储单元融合了空间信息与记忆信息.SRRMR模型在不同种类的Atari游戏中进行了训练和验证,结果表明,空间关系推理与记忆推理的融合在7/15个游戏环境中以更少的交互次数收敛到更好的结果,记忆推理网络在12/15个游戏中获得提升,提升智能体学习效率,更高效地利用序列中的样本,提高了强化学习的样本利用率.Deep reinforcement learning combines the powerful visual perception of deep learning to solve the sequential decision-making problem in complex environments.However,due to the low sampling efficiency,it is difficult to learn the important features of complex high-dimensional data input.In order to extract information from sequence samples more effectively,this paper proposes a model structure integrating Spatial Relationship Reasoning and Memory Reasoning(SRRMR)in deep reinforcement learning.The model is divided into two parts:spatial relation reasoning and memory reasoning.Spatial relation reasoning uses attention mechanism as a spatial relation learning method to implicitly infer the relationship between any two entities,and the query vector in attention mechanism integrates the content of memory reasoning;Memory reasoning takes the characteristics and relations of the input image as the input of memory,uses the self attention mechanism to reason and interact with the memory components,and stores the interactive results in the memory unit,so that the memory storage unit integrates spatial information and memory information.The SRRMR model has been trained and verified in different Atari games.The results show that the integration of spatial relationship reasoning and memory reasoning converges to better results with less interaction times in 7/15 game environments,and the memory reasoning network is improved in 12/15 games,improving the learning efficiency of agents,making more efficient use of samples in sequences,and improving the sample utilization rate of reinforcement learning.

关键词：空间关系推理记忆推理深度强化学习注意力机制状态表示

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种深度强化学习空间关系与记忆融合方法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种深度强化学习空间关系与记忆融合方法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索