不动点视角下的强化学习算法综述  被引量:3

A Survey of Reinforcement Learning Algorithms from a Fixed Point Perspective

在线阅读下载全文

作  者:陈兴国[1,2] 孙丁源昊 杨光[2,3] 杨尚东 高阳[2,3] CHEN Xing-Guo;SUN Dingyuanhao;YANG Guang;YANG Shang-Dong;GAO Yang(Jiangsu Key Laboratory of Big Data Security&Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023;National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210046;Shenzhen Research Institute of Nanjing University,Shenzhen,Guangdong 518057)

机构地区:[1]南京邮电大学大数据安全与智能处理重点实验室,南京210023 [2]南京大学计算机软件新技术国家重点实验室,南京210046 [3]南京大学深圳研究院,广东深圳518057

出  处:《计算机学报》2023年第6期1246-1271,共26页Chinese Journal of Computers

基  金:国家自然科学基金(62276142,62206133,62202240,62192783);科技创新2030-“新一代人工智能”重大项目(2018AAA0100905);江苏省产业前瞻与关键核心技术竞争项目(BE2021028);深圳市中央引导地方科技发展资金(2021Szvup056)资助。

摘  要:近年来,强化学习已成为求解序贯决策任务的范式.然而,在实际应用中,强化学习算法仍存在三个问题:(1)什么解最优?(2)如何保证算法的稳定性?(3)如何加速算法的收敛?本文从不动点视角总结了强化学习算法的设计原理.首先,分析了值函数估计最优解与可行解的构造问题;其次,根据Banach不动点定理和Lyapunov第二判定定理,总结了已有基于值函数强化学习算法的稳定性问题,包括基于表格、线性估计、非线性估计、非参估计等值函数的算法在同策略和异策略情况下的收敛性;然后,从不动点的偏差与方差控制角度,解读了多种提高算法准确性或收敛速度的改进思想;最后总结和展望了强化学习算法的改进方向.Reinforcement Learning has been developed for nearly 40 years since it was proposed.In recent years,with the breakthrough of deep learning,reinforcement learning has achieved many achievements,such as AlphaGo,AlphaZero,DouZero,and so on.Reinforcement learning has become one of the most promising paths to strong artificial intelligence.More and more researchers are trying to apply reinforcement learning to solve sequential decision-making tasks in their specific fields.However,practice studies show that applying classical reinforcement learning algorithm does not directly meet the practical needs.There is still a great challenge for researchers and engineers to design efficient reinforcement learning algorithms for real world decision problems.There are still three problems for reinforcement learning applications:(1)What is the optimal solution?(2)How to ensure the stability of the algorithm?(3)How to speed up the convergence of the algorithm?In recent years,reinforcement learning has grown rapidly with the rise of deep learning,and a dizzying array of algorithms,techniques and tools has emerged.There is an urgent need for researchers to view the latest reinforcement learning techniques from a unified perspective.From the unique perspective of the fixed point,reinforcement learning algorithm design includes value function-based reinforcement learning and policy gradient-based reinforcement learning.Since there is relatively little research on fixed points for policy gradient-based reinforcement learning,this paper focuses mainly on the fundamentals of value function-based reinforcement learning algorithm design:(1)the optimal solution problem and feasible solution construction for value function estimation.(2)The stability problem of the algorithm,i.e.,whether convergence is guaranteed.(3)How quickly the algorithm converges.To this end,this paper summarizes the design principles of reinforcement learning algorithms from a fixed point perspective.First of all,this paper introduces the reinforcement learning model,a

关 键 词:强化学习 值函数估计 稳定性 同策略 异策略 偏差与方差控制 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象