基于记忆启发的强化学习方法研究  被引量:1

Research on Memory Heuristic Reinforcement Learning

在线阅读下载全文

作  者:刘晓峰[1] 刘智斌[2] 董兆安[2] LIU Xiao-feng;LIU Zhi-bin;DONG Zhao-an(Liberary,Qufu Normal University,Rizhao 276826,China;School of Computer Science,Qufu Normal University,Rizhao 276826,China)

机构地区:[1]曲阜师范大学图书馆,山东日照276826 [2]曲阜师范大学计算机学院,山东日照276826

出  处:《计算机技术与发展》2023年第6期168-172,180,共6页Computer Technology and Development

基  金:山东省自然科学基金(ZR2020MF149)。

摘  要:该文旨在研究人工智能领域的强化学习问题。在处理优化问题的过程中,强化学习具有不依赖于模型信息的特点,在信息产业和生产领域逐步获得应用,并取得了较好的效果。然而,传统的强化学习算法通过随机探索获得优化行为,存在学习速度慢、收敛不及时的问题。为了提高强化学习的效率,提出一种方法,让Agent利用自身学习得到的知识,指导和加速其以后的学习过程。将Q学习和启发式Shaping回报函数结合起来,利用记忆的知识加速了Agent的学习过程。另外,证明了采用启发函数与不使用启发函数在策略优化上的一致性。针对一个路径规划问题,采用了学习过程中生成的势场函数作为启发函数,通过启发函数对强化学习的探索过程给予指导。在实验中对该方法进行了验证,分析了采用不同参数带来的不同效果,并提出了一个解决死点问题的方法。结果表明,该方法对强化学习过程有明显的加速作用,并能取得优化的搜索路径。We aim to research on the reinforcement learning problems in the field of artificial intelligence.In the process of dealing with optimization problems,reinforcement learning has the feature of not relying on model information,which gradually gains applications in the areas of information and production,achieving better results.However,the traditional reinforcement learning algorithm obtains the optimization behavior by random exploration,which has the problems of slow learning speed and untimely convergence.In order to improve the efficiency of reinforcement learning,we propose a method that allows an agent to use the knowledge obtained from its own learning to guide and accelerate its subsequent learning process.Q-learning and heuristic Shaping reward function are combined to accelerate the learning process of the agent by utilizing the knowledge of memory.In addition,we demonstrate the consistency of using heuristic function and not using heuristic function in policy optimization.For a path planning problem,we adopt the potential field function generated during the learning process as a heuristic function,which gives guidance to the exploration process of reinforcement learning.The method is validated in experiments,the different effects brought by using different parameters are analyzed,and a method to solve the dead point problem is proposed.The results show that adopting the proposed method has a significant acceleration effect on the reinforcement learning process and can obtain an optimized searching path.

关 键 词:强化学习 Q学习 启发式搜索 Shaping函数 路径规划 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象