强化学习中的策略重用:研究进展被引量：5

Survey on policy reuse in reinforcement learning

作　　者：何立沈亮李辉[1,2] 王壮唐文泉 HE Li;SHEN Liang;LI Hui;WANG Zhuang;TANG Wenquan(School of Computer Science(Software),Sichuan University,Chengdu 610065,China;National Key Laboratory of Fundamental Science on Synthetic Vision,Sichuan University,Chengdu 610065,China;Jiangxi Hongdu Aviation Industry Group Company Limited,Nanchang 330024,China)

机构地区：[1]四川大学计算机(软件)学院,四川成都610065 [2]四川大学视觉合成图形图像技术国家级重点实验室,四川成都610065 [3]江西洪都航空工业集团有限责任公司,江西南昌330024

出　　处：《系统工程与电子技术》2022年第3期884-899,共16页Systems Engineering and Electronics

基　　金："十三五"全军共用信息系统装备预研项目(31505550302)资助课题。

摘　　要：策略重用(policy reuse, PR)作为一种迁移学习(transfer learning, TL)方法,通过利用任务之间的内在联系,将过去学习到的经验、知识用于加速学习当前的目标任务,不仅能够在很大程度上解决传统强化学习(reinforcement learning, RL)收敛速度慢、资源消耗大等问题,而且避免了在相似问题上难以复用的问题。本文综述了RL中的PR方法,将现有方法细分为策略重构、奖励设计、问题转换、相似性度量等方面来分别介绍和分析各自的特点,及其在多智能体场景和深度RL(deep RL, DRL)中的扩展。并且,介绍了源和目标任务之间的映射方法。最后,基于当前PR的应用,叙述了该课题在未来发展方向上的一些猜想和假设。Policy reuse(PR) is a transfer learning(TL) method. By using the internal connection among tasks, the experience and knowledge learned in the past can be used to accelerate the learning of the current target task. To a large extent, it solves the problems of traditional reinforcement learning(RL), such as slow convergence speed and high resource consumption, and avoids the problem of difficult reuse on similar problems. This paper reviews PR methods in RL, subdivided as policy reconstruction, reward shaping, problem transformation and similarity measurement, presents their characteristics respectively, and introduces their extensions in multi-agent scenarios and deep RL(DRL). Then, the mapping methods between source and target tasks are introduced. Finally, based on the current application of PR, some conjectures and assumptions about the future development direction of this subject are described.

关键词：强化学习迁移学习策略重用任务映射

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

强化学习中的策略重用:研究进展被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

强化学习中的策略重用:研究进展 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

强化学习中的策略重用:研究进展被引量：5