融合引力搜索的双延迟深度确定策略梯度方法  被引量:2

Twin-delayed-based Deep Deterministic Policy Gradient Method Integrating Gravitational Search

在线阅读下载全文

作  者:徐平安 刘全[1,2,3,4] 郝少璞 张立华 XU Ping-An;LIU Quan;HAO Shao-Pu;ZHANG Li-Hua(School of Computer Science&Technology,Soochow University,Suzhou 215006,China;Collaborative Innovation Center of Novel Software Technology and Industrialization(Nanjing),Nanjing 210093,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education(Jilin University),Changchun 130012,China;Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou 215006,China)

机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]软件新技术与产业化协同创新中心(南京),江苏南京210093 [3]符号计算与知识工程教育部重点实验室(吉林大学),吉林长春130012 [4]江苏省计算机信息处理技术重点实验室(苏州大学),江苏苏州215006

出  处:《软件学报》2023年第11期5191-5204,共14页Journal of Software

基  金:国家自然科学基金(61772355,61702055,61876217,62176175);江苏高校优势学科建设工程。

摘  要:近年来,深度强化学习在复杂控制任务中取得了令人瞩目的效果,然而由于超参数的高敏感性和收敛性难以保证等原因,严重影响了其对现实问题的适用性.元启发式算法作为一类模拟自然界客观规律的黑盒优化方法,虽然能够有效避免超参数的敏感性,但仍存在无法适应待优化参数量规模巨大和样本使用效率低等问题.针对以上问题,提出融合引力搜索的双延迟深度确定策略梯度方法(twin delayed deep deterministic policy gradient based on gravitational search algorithm,GSA-TD3).该方法融合两类算法的优势:一是凭借梯度优化的方式更新策略,获得更高的样本效率和更快的学习速度;二是将基于万有引力定律的种群更新方法引入到策略搜索过程中,使其具有更强的探索性和更好的稳定性.将GSA-TD3应用于一系列复杂控制任务中,实验表明,与前沿的同类深度强化学习方法相比,GSA-TD3在性能上具有显著的优势.In recent years,deep reinforcement learning has achieved impressive results in complex control tasks.However,its applicability to real-world problems has been seriously weakened by the high sensitivity of hyperparameters and the difficulty in guaranteeing convergence.Metaheuristic algorithms,as a class of black-box optimization methods simulating the objective laws of nature,can effectively avoid the sensitivity of hyperparameters.Nevertheless,they are still faced with various problems,such as the inability to adapt to a huge scale of parameters to be optimized and the low efficiency of sample usage.To address the above problems,this study proposes the twin delayed deep deterministic policy gradient based on a gravitational search algorithm(GSA-TD3).The method combines the advantages of the two types of algorithms.Specifically,it updates the policy by gradient optimization for higher sample efficiency and a faster learning speed.Moreover,it applies the population update method based on the law of gravity to the policy search process to make it more exploratory and stable.GSA-TD3 is further applied to a series of complex control tasks,and experiments show that it significantly out performs similar deep reinforcement learning methods at the forefront.

关 键 词:深度强化学习 元启发式算法 引力搜索 确定策略梯度 策略搜索 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象