基于内在好奇心与自模仿学习的探索算法  

Exploration algorithm based on intrinsic curiosity and SIL

在线阅读下载全文

作  者:吕相霖 臧兆祥[1,2] 李思博 邹耀斌[1,2] LÜXianglin;ZANG Zhaoxiang;LI Sibo;ZOU Yaobin(Hubei Key Laboratory of Intelligent Vision Monitoring for Hydropower Engineering,China Three Gorges University,Yichang 443002,China;School of Computer and Information,China Three Gorges University,Yichang 443002,China)

机构地区:[1]三峡大学水电工程智能视觉监测湖北省重点实验室,湖北宜昌443002 [2]三峡大学计算机与信息学院,湖北宜昌443002

出  处:《现代电子技术》2024年第16期137-144,共8页Modern Electronics Technique

基  金:国家自然科学基金项目(61502274);湖北省自然科学基金项目(2015CFB336)

摘  要:针对深度强化学习算法在部分可观测环境中面临的稀疏奖励、信息缺失等问题,提出一种结合好奇心模块与自模仿学习的近端策略优化算法。该算法利用随机网络来生成探索过程中的经验样本数据,然后利用优先经验回放技术选取高质量样本,通过自模仿学习对优秀的序列轨迹进行模仿,并更新一个新的策略网络用于指导探索行为。在Minigrid环境中设置了消融与对比实验,实验结果表明,所提算法在收敛速度上具有明显优势,并且能够完成更为复杂的部分可观测环境探索任务。In allusion to the problems of sparse rewards and missing information faced by deep reinforcement learning algorithm in partially observable environments,a proximal policy optimization algorithm combining curiosity module and self-imitation learning(SIL)is proposed.In this algorithm,the random network is used to generate empirical sample data during the exploration process,and then the priority experience replay technology is used to select high-quality samples.The excellent sequence trajectories are imitated by means of SIL,and a new policy network is updated to guide the exploration behavior.The ablation and comparison experiments were performed in the Minigrid environment.The experimental results show that the proposed algorithm has a significant advantage in convergence speed and can complete more complex exploration tasks of partially observable environments.

关 键 词:好奇心模块 自模仿学习 深度强化学习 近端策略优化 随机网络 优先经验回放 

分 类 号:TN911-34[电子电信—通信与信息系统] TP242.6[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象