检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘卉玲 刘鹏[1] 白辰甲 LIU Hui-Ling;LIU Peng;Bai Chen-Jia(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150008;Shanghai AI Laboratory,Shanghai 200232)
机构地区:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150008 [2]上海人工智能实验室,上海200232
出 处:《计算机学报》2023年第4期814-826,共13页Chinese Journal of Computers
基 金:国家自然科学基金重点项目(No.51935005);基础科研项目(No.JCKY20200603C010);黑龙江省自然科学基金(No.LH2021F023)资助;黑龙江省科技计划项目(No.GA21C031)资助.
摘 要:深度强化学习结合了深度学习在视觉上强大的感知能力来解决复杂环境的序列决策问题,但是由于采样效率低,对于复杂高维数据输入,学习其重要特征较为困难.为了从序列样本中更有效地提取信息,本文提出在深度强化学习中融合空间关系推理和记忆推理(Spatial Relationship Reasoning and Memory Reasoning,SRRMR)的模型结构.模型分为空间关系推理和记忆推理两部分,空间关系推理使用注意力机制作为空间关系学习方法隐式地推理任意两个实体间的关系,注意力机制中的查询向量融合了记忆推理的内容;记忆推理将输入图像的特征和关系作为记忆的输入,利用自注意力与记忆组成部分进行推理和交互,并将交互的结果存储在记忆单元中,使得记忆存储单元融合了空间信息与记忆信息.SRRMR模型在不同种类的Atari游戏中进行了训练和验证,结果表明,空间关系推理与记忆推理的融合在7/15个游戏环境中以更少的交互次数收敛到更好的结果,记忆推理网络在12/15个游戏中获得提升,提升智能体学习效率,更高效地利用序列中的样本,提高了强化学习的样本利用率.Deep reinforcement learning combines the powerful visual perception of deep learning to solve the sequential decision-making problem in complex environments.However,due to the low sampling efficiency,it is difficult to learn the important features of complex high-dimensional data input.In order to extract information from sequence samples more effectively,this paper proposes a model structure integrating Spatial Relationship Reasoning and Memory Reasoning(SRRMR)in deep reinforcement learning.The model is divided into two parts:spatial relation reasoning and memory reasoning.Spatial relation reasoning uses attention mechanism as a spatial relation learning method to implicitly infer the relationship between any two entities,and the query vector in attention mechanism integrates the content of memory reasoning;Memory reasoning takes the characteristics and relations of the input image as the input of memory,uses the self attention mechanism to reason and interact with the memory components,and stores the interactive results in the memory unit,so that the memory storage unit integrates spatial information and memory information.The SRRMR model has been trained and verified in different Atari games.The results show that the integration of spatial relationship reasoning and memory reasoning converges to better results with less interaction times in 7/15 game environments,and the memory reasoning network is improved in 12/15 games,improving the learning efficiency of agents,making more efficient use of samples in sequences,and improving the sample utilization rate of reinforcement learning.
关 键 词:空间关系推理 记忆推理 深度强化学习 注意力机制 状态表示
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.83.96