融合引力搜索的双延迟深度确定策略梯度方法被引量：2

Twin-delayed-based Deep Deterministic Policy Gradient Method Integrating Gravitational Search

作　　者：徐平安刘全[1,2,3,4] 郝少璞张立华 XU Ping-An;LIU Quan;HAO Shao-Pu;ZHANG Li-Hua(School of Computer Science&Technology,Soochow University,Suzhou 215006,China;Collaborative Innovation Center of Novel Software Technology and Industrialization(Nanjing),Nanjing 210093,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education(Jilin University),Changchun 130012,China;Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou 215006,China)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]软件新技术与产业化协同创新中心(南京),江苏南京210093 [3]符号计算与知识工程教育部重点实验室(吉林大学),吉林长春130012 [4]江苏省计算机信息处理技术重点实验室(苏州大学),江苏苏州215006

出　　处：《软件学报》2023年第11期5191-5204,共14页Journal of Software

基　　金：国家自然科学基金(61772355,61702055,61876217,62176175);江苏高校优势学科建设工程。

摘　　要：近年来,深度强化学习在复杂控制任务中取得了令人瞩目的效果,然而由于超参数的高敏感性和收敛性难以保证等原因,严重影响了其对现实问题的适用性.元启发式算法作为一类模拟自然界客观规律的黑盒优化方法,虽然能够有效避免超参数的敏感性,但仍存在无法适应待优化参数量规模巨大和样本使用效率低等问题.针对以上问题,提出融合引力搜索的双延迟深度确定策略梯度方法(twin delayed deep deterministic policy gradient based on gravitational search algorithm,GSA-TD3).该方法融合两类算法的优势:一是凭借梯度优化的方式更新策略,获得更高的样本效率和更快的学习速度;二是将基于万有引力定律的种群更新方法引入到策略搜索过程中,使其具有更强的探索性和更好的稳定性.将GSA-TD3应用于一系列复杂控制任务中,实验表明,与前沿的同类深度强化学习方法相比,GSA-TD3在性能上具有显著的优势.In recent years,deep reinforcement learning has achieved impressive results in complex control tasks.However,its applicability to real-world problems has been seriously weakened by the high sensitivity of hyperparameters and the difficulty in guaranteeing convergence.Metaheuristic algorithms,as a class of black-box optimization methods simulating the objective laws of nature,can effectively avoid the sensitivity of hyperparameters.Nevertheless,they are still faced with various problems,such as the inability to adapt to a huge scale of parameters to be optimized and the low efficiency of sample usage.To address the above problems,this study proposes the twin delayed deep deterministic policy gradient based on a gravitational search algorithm(GSA-TD3).The method combines the advantages of the two types of algorithms.Specifically,it updates the policy by gradient optimization for higher sample efficiency and a faster learning speed.Moreover,it applies the population update method based on the law of gravity to the policy search process to make it more exploratory and stable.GSA-TD3 is further applied to a series of complex control tasks,and experiments show that it significantly out performs similar deep reinforcement learning methods at the forefront.

关键词：深度强化学习元启发式算法引力搜索确定策略梯度策略搜索

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合引力搜索的双延迟深度确定策略梯度方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合引力搜索的双延迟深度确定策略梯度方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

融合引力搜索的双延迟深度确定策略梯度方法被引量：2