演化算法的DQN网络参数优化方法  

Method for Optimizing Parameters of Deep Q Network based on Evolutionary Algorithms

在线阅读下载全文

作  者:曹子建[1] 郭瑞麒 贾浩文 李骁 徐恺 CAO Zijian;GUO Ruiqi;JIA Haowen;LI Xiao;XU Kai(School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China)

机构地区:[1]西安工业大学计算机科学与工程学院,西安710021

出  处:《西安工业大学学报》2024年第2期219-231,共13页Journal of Xi’an Technological University

基  金:陕西省自然科学基础研究计划项目(2020JM-565)。

摘  要:为了解决DQN(Deep Q Network)在早期会出现盲目搜索、勘探利用不均并导致整个算法收敛过慢的问题,从探索前期有利于算法训练的有效信息获取与利用的角度出发,以差分演化(Differential Evolution)算法为例,提出了一种基于演化算法优化DQN网络参数以加快其收敛速度的方法(DE-DQN)。首先,将DQN的网络参数编码为演化个体;其次,分别采用“运行步长”和“平均回报”两种适应度函数评价方式;利用CartPole控制问题进行仿真对比,验证了两种评价方式的有效性。最后,实验结果表明,在智能体训练5 000代时所提出的改进算法,以“运行步长”为适应度函数时,在运行步长、平均回报和累计回报上分别提高了82.7%,18.1%和25.1%,并优于改进DQN算法;以“平均回报”为适应度函数时,在运行步长、平均回报和累计回报上分别提高了74.9%,18.5%和13.3%并优于改进DQN算法。这说明了DE-DQN算法相较于传统的DQN及其改进算法前期能获得更多有用信息,加快收敛速度。The study aims to address the issues of blind search,uneven exploration exploitation and slow convergence in the early stages of DQN(Deep Q Network).From the perspective of effective information acquisition and utilization beneficial for algorithm training and with Differential Evolution(DE)algorithm as an example,the paper presents a method named DE DQN for optimizing the parameters of the DQN network based on evolutionary algorithms,aiming to accelerate its convergence speed.Firstly,the network parameters of DQN are encoded as evolutionary individuals.Secondly,two fitness evaluation metrics,“run length”and“average return”are employed separately.The effectiveness of the two evaluation methods is verified through simulation comparisons using the CartPole control problem.Finally,the experimental results indicate that training for 5000 generations,the proposed algorithm increases by 82.7%,18.1%,and 25.1%in run length,average return,and cumulative return,respectively,when“run length”is used as the fitness function and by 74.9%,18.5%,and 13.3%in run length,average return,and cumulative return,respectively,when“average return”is used as the fitness function,outperforming the improved DQN algorithm.It is concluded that compared to traditional DQN and its improved algorithms,the DE DQN algorithm can acquire more useful information in the early stages and,therefore,accelerate the convergence speed.

关 键 词:深度强化学习 深度Q网络 收敛加速 演化算法 自动控制 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象