检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐沛 黄凯奇[1,2,3] XU Pei;HUANG Kaiqi(Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology,Shanghai 200031,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院自动化研究所智能系统与工程研究中心,北京100190 [2]中国科学院脑科学与智能技术卓越创新中心,上海200031 [3]中国科学院大学人工智能学院,北京100049
出 处:《图学学报》2024年第6期1165-1177,共13页Journal of Graphics
基 金:新一代人工智能国家科技重大专项(2022ZD0116403);国家资助博士后研究人员计划项目(GZC20232995);中国科学院战略性先导科技专项资助项目(XDA27010201)。
摘 要:深度强化学习作为支撑AlphaGo和ChatGPT等突破性工作的关键技术,已成为前沿科学的研究热点。在实际应用上,深度强化学习作为一种重要的智能决策技术,被广泛应用于视觉场景的避障、虚拟场景的优化生成、机器臂控制、数字化设计与制造、工业设计决策等多种规划决策任务。然而,深度强化学习在实际应用中面临样本效率低下的挑战,严重限制了其应用效果。为缓解这一问题,针对现有强化学习探索机制的不足,将大模型技术与多种主流探索技术相结合,提出了一种基于大模型引导的高效探索方法,以提升样本效率。通过利用大模型来指导深度强化学习智能体的探索行为,该方法在多个国际公认的测试环境中显示出显著的性能提升,不仅展示了大模型技术在深度强化学习探索问题中的潜力,也为实际应用中改善样本效率提供了新的解决思路。Deep reinforcement learning,as a key technology supporting breakthrough works such as AlphaGo and ChatGPT,has become a research hotspot in frontier science.In practical applications,deep reinforcement learning,as an important intelligent decision-making technology,is widely used in a variety of planning and decision-making tasks,such as obstacle avoidance in visual scenes,optimal generation of virtual scenes,robotic arm control,digital design and manufacturing,and industrial design decision-making.However,deep reinforcement learning faces the challenge of low sample efficiency in practical applications,which greatly limits its application effectiveness.In order to improve the sample efficiency,this paper proposes an efficient exploration method based on large model guidance,which combines the large model with the mainstream exploration techniques.Specifically,we utilize the semantic extraction capability of a large language model to obtain semantic information of states,which is then used to guide the exploration behavior of agents.Then,we introduce the semantic information into the classical methods in single-policy exploration and population exploration,respectively.By using the large model to guide the exploration behavior of deep reinforcement learning agents,our method shows significant performance improvement in popular environments.This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems,but also provides a new idea to alleviate the low sample efficiency problem in practical applications.
分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49