检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘晓峰[1] 刘智斌[2] 董兆安[2] LIU Xiao-feng;LIU Zhi-bin;DONG Zhao-an(Liberary,Qufu Normal University,Rizhao 276826,China;School of Computer Science,Qufu Normal University,Rizhao 276826,China)
机构地区:[1]曲阜师范大学图书馆,山东日照276826 [2]曲阜师范大学计算机学院,山东日照276826
出 处:《计算机技术与发展》2023年第6期168-172,180,共6页Computer Technology and Development
基 金:山东省自然科学基金(ZR2020MF149)。
摘 要:该文旨在研究人工智能领域的强化学习问题。在处理优化问题的过程中,强化学习具有不依赖于模型信息的特点,在信息产业和生产领域逐步获得应用,并取得了较好的效果。然而,传统的强化学习算法通过随机探索获得优化行为,存在学习速度慢、收敛不及时的问题。为了提高强化学习的效率,提出一种方法,让Agent利用自身学习得到的知识,指导和加速其以后的学习过程。将Q学习和启发式Shaping回报函数结合起来,利用记忆的知识加速了Agent的学习过程。另外,证明了采用启发函数与不使用启发函数在策略优化上的一致性。针对一个路径规划问题,采用了学习过程中生成的势场函数作为启发函数,通过启发函数对强化学习的探索过程给予指导。在实验中对该方法进行了验证,分析了采用不同参数带来的不同效果,并提出了一个解决死点问题的方法。结果表明,该方法对强化学习过程有明显的加速作用,并能取得优化的搜索路径。We aim to research on the reinforcement learning problems in the field of artificial intelligence.In the process of dealing with optimization problems,reinforcement learning has the feature of not relying on model information,which gradually gains applications in the areas of information and production,achieving better results.However,the traditional reinforcement learning algorithm obtains the optimization behavior by random exploration,which has the problems of slow learning speed and untimely convergence.In order to improve the efficiency of reinforcement learning,we propose a method that allows an agent to use the knowledge obtained from its own learning to guide and accelerate its subsequent learning process.Q-learning and heuristic Shaping reward function are combined to accelerate the learning process of the agent by utilizing the knowledge of memory.In addition,we demonstrate the consistency of using heuristic function and not using heuristic function in policy optimization.For a path planning problem,we adopt the potential field function generated during the learning process as a heuristic function,which gives guidance to the exploration process of reinforcement learning.The method is validated in experiments,the different effects brought by using different parameters are analyzed,and a method to solve the dead point problem is proposed.The results show that adopting the proposed method has a significant acceleration effect on the reinforcement learning process and can obtain an optimized searching path.
关 键 词:强化学习 Q学习 启发式搜索 Shaping函数 路径规划
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90