检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘静[1,2] 郭龙腾 Jing Liu;Longteng Guo(Instituteof Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100090)
机构地区:[1]中国科学院自动化研究所,北京100190 [2]中国科学院大学人工智能学院,北京100190
出 处:《中国科学基金》2023年第5期793-802,共10页Bulletin of National Natural Science Foundation of China
基 金:科技创新2030“新一代人工智能”重大项目(2022ZD0118801);国家自然科学基金项目(U21B2043)的资助。
摘 要:对话式聊天机器人ChatGPT以近乎摧枯拉朽的气势席卷社会,拨开了通用人工智能的曙光。ChatGPT的升级版GPT-4是个多模态大模型,它从单调的文本交互,升级为可以接受文本与图像组合的多模态输入,相比传统的单模态大模型,多模态大模型更加符合人类的多渠道感认知方式,能够应对更加复杂丰富的环境、场景和任务。GPT-4表明在多模态大模型中引入基于人类知识的自然语言理解与生成能力能够带来模型在多模态理解、生成、交互能力上的巨大提升。本文将介绍多模态大模型的概念、关键技术、近期进展和应用场景、GPT-4的技术特性,并重点探讨以GPT-4为代表的大语言模型对构建多模态大模型的几点启发。具体而言,将讨论如何充分利用大语言模型的语言能力,在多模态大模型的构建中,借助语言的帮助更好地感知理解世界、创作生成内容、与人和环境交互。ChatGPT,a conversational chatbot,has swept across society with its almost unstoppable momentum,heralding the dawn of general artificial intelligence.Its upgraded version,GPT-4,is a multimodal large-scale model that goes beyond monotonous text interactions and can accept combinations of text and images as multimodal inputs.Compared to traditional unimodal foundation models,multimodal foundation models are more consistent with human cognitive processes that involve multiple channels,allowing them to adapt to more complex environments,scenes and tasks.GPT-4 demonstrates that incorporating natural language understanding and generation abilities into multimodal foundation models can greatly enhance the model's abilities in multimodal understanding,generation,and interaction.This article introduces the concept of multimodal foundation models,key technologies,recent advancements,and application scenarios.It also discusses the technical characteristics of GPT-4 and specifically explore several inspirations provided by large language models,such as GPT-4,for building multimodal foundation models.Specifically,it discusses how to fully leverage the language capabilities of large language models to better perceive and understand the world,generate creative content,and interact with humans and the environment in the construction of multimodal foundation models.
关 键 词:GPT-4 多模态大模型 多模态理解 多模态生成 多模态交互
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7