基于内在好奇心与自模仿学习的探索算法

Exploration algorithm based on intrinsic curiosity and SIL

作　　者：吕相霖臧兆祥[1,2] 李思博邹耀斌[1,2] LÜXianglin;ZANG Zhaoxiang;LI Sibo;ZOU Yaobin(Hubei Key Laboratory of Intelligent Vision Monitoring for Hydropower Engineering,China Three Gorges University,Yichang 443002,China;School of Computer and Information,China Three Gorges University,Yichang 443002,China)

机构地区：[1]三峡大学水电工程智能视觉监测湖北省重点实验室,湖北宜昌443002 [2]三峡大学计算机与信息学院,湖北宜昌443002

出　　处：《现代电子技术》2024年第16期137-144,共8页Modern Electronics Technique

基　　金：国家自然科学基金项目(61502274);湖北省自然科学基金项目(2015CFB336)

摘　　要：针对深度强化学习算法在部分可观测环境中面临的稀疏奖励、信息缺失等问题,提出一种结合好奇心模块与自模仿学习的近端策略优化算法。该算法利用随机网络来生成探索过程中的经验样本数据,然后利用优先经验回放技术选取高质量样本,通过自模仿学习对优秀的序列轨迹进行模仿,并更新一个新的策略网络用于指导探索行为。在Minigrid环境中设置了消融与对比实验,实验结果表明,所提算法在收敛速度上具有明显优势,并且能够完成更为复杂的部分可观测环境探索任务。In allusion to the problems of sparse rewards and missing information faced by deep reinforcement learning algorithm in partially observable environments,a proximal policy optimization algorithm combining curiosity module and self-imitation learning(SIL)is proposed.In this algorithm,the random network is used to generate empirical sample data during the exploration process,and then the priority experience replay technology is used to select high-quality samples.The excellent sequence trajectories are imitated by means of SIL,and a new policy network is updated to guide the exploration behavior.The ablation and comparison experiments were performed in the Minigrid environment.The experimental results show that the proposed algorithm has a significant advantage in convergence speed and can complete more complex exploration tasks of partially observable environments.

关键词：好奇心模块自模仿学习深度强化学习近端策略优化随机网络优先经验回放

分类号：TN911-34[电子电信—通信与信息系统] TP242.6[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于内在好奇心与自模仿学习的探索算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于内在好奇心与自模仿学习的探索算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索