基于平均神经网络参数的DQN算法被引量：4

DQN Algorithm Based on Averaged Neural Network Parameters

作　　者：黄志勇吴昊霖王壮李辉[1] HUANG Zhi-yong;WU Hao-lin;WANG Zhuang;LI Hui(College of Computer Science,Sichuan University,Chengdu 610065,China)

机构地区：[1]四川大学计算机学院,成都610065

出　　处：《计算机科学》2021年第4期223-228,共6页Computer Science

基　　金：教育部联合基金(6141A02011607)。

摘　　要：在深度强化学习领域,如何有效地探索环境是一个难题。深度Q网络(Deep Q-Network,DQN)使用ε-贪婪策略来探索环境,ε的大小和衰减需要人工进行调节,而调节不当会导致性能变差。这种探索策略不够高效,不能有效解决深度探索问题。针对DQN的ε-贪婪策略探索效率不够高的问题,提出一种基于平均神经网络参数的DQN算法(Averaged Parameters DQN,AP-DQN)。该算法在回合开始时,将智能体之前学习到的多个在线值网络参数进行平均,得到一个扰动神经网络参数,然后通过扰动神经网络进行动作选择,从而提高智能体的探索效率。实验结果表明,AP-DQN算法在面对深度探索问题时的探索效率优于DQN,在5个Atari游戏环境中相比DQN获得了更高的平均每回合奖励,归一化后的得分相比DQN最多提升了112.50%,最少提升了19.07%。In the field of deep reinforcement learning,how to efficiently explore environment is a hard problem.Deep Q-network algorithm explores environment with epsilon-greedy policy whose size and decay need manual tuning.Unsuitable tuning will cause a poor performance.The epsilon-greedy policy is ineffective and cannot resolve deep exploration problem.In this paper,in order to solve the problem,a deep reinforcement learning algorithm based on averaged neural network parameters(AP-DQN)is proposed.At the beginning of episode,the algorithm averages the multiple online network parameters learned by the agent to obtain a perturbed neural network parameter,and then selects an action through the perturbed neural network,which can improve the agent’s exploration efficiency.Experiment results show that the exploration efficiency of AP-DQN is better than that of DQN on deep exploration problem and AP-DQN get higher scores than DQN in five Atari games.The normalized score increases by 112.50%at most and 19.07%at least compared with DQN.

关键词：深度强化学习深度Q网络神经网络参数深度探索

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于平均神经网络参数的DQN算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于平均神经网络参数的DQN算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于平均神经网络参数的DQN算法被引量：4