大模型引导的高效强化学习方法

An efficient reinforcement learning method based on large language model

作　　者：徐沛黄凯奇[1,2,3] XU Pei;HUANG Kaiqi(Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology,Shanghai 200031,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区：[1]中国科学院自动化研究所智能系统与工程研究中心,北京100190 [2]中国科学院脑科学与智能技术卓越创新中心,上海200031 [3]中国科学院大学人工智能学院,北京100049

出　　处：《图学学报》2024年第6期1165-1177,共13页Journal of Graphics

基　　金：新一代人工智能国家科技重大专项(2022ZD0116403);国家资助博士后研究人员计划项目(GZC20232995);中国科学院战略性先导科技专项资助项目(XDA27010201)。

摘　　要：深度强化学习作为支撑AlphaGo和ChatGPT等突破性工作的关键技术,已成为前沿科学的研究热点。在实际应用上,深度强化学习作为一种重要的智能决策技术,被广泛应用于视觉场景的避障、虚拟场景的优化生成、机器臂控制、数字化设计与制造、工业设计决策等多种规划决策任务。然而,深度强化学习在实际应用中面临样本效率低下的挑战,严重限制了其应用效果。为缓解这一问题,针对现有强化学习探索机制的不足,将大模型技术与多种主流探索技术相结合,提出了一种基于大模型引导的高效探索方法,以提升样本效率。通过利用大模型来指导深度强化学习智能体的探索行为,该方法在多个国际公认的测试环境中显示出显著的性能提升,不仅展示了大模型技术在深度强化学习探索问题中的潜力,也为实际应用中改善样本效率提供了新的解决思路。Deep reinforcement learning,as a key technology supporting breakthrough works such as AlphaGo and ChatGPT,has become a research hotspot in frontier science.In practical applications,deep reinforcement learning,as an important intelligent decision-making technology,is widely used in a variety of planning and decision-making tasks,such as obstacle avoidance in visual scenes,optimal generation of virtual scenes,robotic arm control,digital design and manufacturing,and industrial design decision-making.However,deep reinforcement learning faces the challenge of low sample efficiency in practical applications,which greatly limits its application effectiveness.In order to improve the sample efficiency,this paper proposes an efficient exploration method based on large model guidance,which combines the large model with the mainstream exploration techniques.Specifically,we utilize the semantic extraction capability of a large language model to obtain semantic information of states,which is then used to guide the exploration behavior of agents.Then,we introduce the semantic information into the classical methods in single-policy exploration and population exploration,respectively.By using the large model to guide the exploration behavior of deep reinforcement learning agents,our method shows significant performance improvement in popular environments.This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems,but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

关键词：深度强化学习大语言模型高效探索

分类号：TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大模型引导的高效强化学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大模型引导的高效强化学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索