基于改进好奇心的深度强化学习方法  

Research on deep reinforcement learning method based on improved curiosity

在线阅读下载全文

作  者:乔和[1] 李增辉 刘春 胡嗣栋 Qiao He;Li Zenghui;Liu Chun;Hu Sidong(School of Electrical&Control Engineering,Liaoning Technology University,Huludao Liaoning 125105,China)

机构地区:[1]辽宁工程技术大学电气与控制工程学院,辽宁葫芦岛125105

出  处:《计算机应用研究》2024年第9期2635-2640,共6页Application Research of Computers

基  金:国家自然科学基金资助项目(51604141,51204087)。

摘  要:在深度强化学习方法中,针对内在好奇心模块(intrinsic curiosity model,ICM)指导智能体在稀疏奖励环境中获得未知策略学习的机会,但好奇心奖励是一个状态差异值,会使智能体过度关注于对新状态的探索,进而出现盲目探索的问题,提出了一种基于知识蒸馏的内在好奇心改进算法(intrinsic curiosity model algorithm based on knowledge distillation,KD-ICM)。首先,该算法引入知识蒸馏的方法,使智能体在较短的时间内获得更丰富的环境信息和策略知识,加速学习过程;其次,通过预训练教师神经网络模型去引导前向网络,得到更高精度和性能的前向网络模型,减少智能体的盲目探索。在Unity仿真平台上设计了两个不同的仿真实验进行对比,实验表明,在复杂仿真任务环境中,KD-ICM算法的平均奖励比ICM提升了136%,最优动作概率比ICM提升了13.47%,提升智能体探索性能的同时能提高探索的质量,验证了算法的可行性。In the deep reinforcement learning method,the intrinsic curiosity model(ICM)guides the agent to obtain the opportunity to learn unknown strategies in the sparse reward environment,but the curiosity reward is a state difference value,which will make the agent pay too much attention to the exploration of new states,then could be the problem of blind exploration arises.To solve the above problem,this paper proposed an intrinsic curiosity model algorithm based on knowledge distillation(KD-ICM).Firstly,it introduced the method of knowledge distillation to make the agent acquire more abundant environmental information and strategy knowledge in a short time and accelerate the learning process.Secondly,by pre-training teachers’neural network model to guide the forward network to obtain a forward network model with higher accuracy and performance,reduced the blind exploration of agents.It designed two different simulation experiments on the Unity simulation platform for comparison.The experiments show that in the complex simulation task environment,the average reward of KD-ICM algorithm is 136%higher than that of ICM,and the optimal action probability is 13.47%higher than that of ICM.Both the exploration performance of the agent and the exploration quality can be improved,and it verifies the feasibility of the algorithm.

关 键 词:深度强化学习 知识蒸馏 近端策略优化 稀疏奖励 内在好奇心 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象