机构地区:[1]上海交通大学人工智能教育部重点实验室,上海200240 [2]苏州大学计算机科学与技术学院,江苏苏州215021
出 处:《计算机学报》2024年第4期790-820,共31页Chinese Journal of Computers
基 金:国家自然科学基金优秀青年科学基金项目(No.62222607);上海市级科技重大专项(No.2021SHZDZX0102);国家自然科学基金(No.62002252)资助。
摘 要:近年来,随着提示学习方法在自然语言处理领域被提出,其日益受到研究人员广泛关注,它通过将各类下游任务重构成预训练任务的形式,以参数高效和数据高效的方式将大规模预训练模型应用在各类自然语言相关下游任务中.其中以GPT系列为代表的模型通过提示学习在对话生成和多模态图文理解等任务上取得了巨大的成功.然而,这类模型及方法还不能解决视觉中的稠密任务.受此启发,一些研究人员逐渐将提示学习广泛应用到视觉相关的各类任务当中,如图像识别、目标检测、图像分割、领域适应、持续学习等.由于目前还没有提示学习应用在视觉相关领域中的综述,本文将对视觉单模态领域以及视觉语言多模态领域的提示学习方法展开全面论述和分析.作为回顾,我们首先简要介绍自然语言处理领域的预训练模型,并对提示学习的基本概念、下游应用形式以及提示模板类型进行阐述和分类.其次,我们分别介绍视觉单模态领域以及视觉语言多模态领域里提示学习方法适配的预训练模型和任务.再次,我们分别介绍视觉单模态领域以及视觉语言多模态领域的提示学习方法.在自然语言处理领域,提示学习方法以继承预训练形式实现多任务统一为主要目的;与此不同,在视觉相关领域,提示学习方法侧重于面向特定下游任务进行设计.为此,我们将从方法设计上进行简单分类,然后从应用任务角度详细介绍视觉单模态提示学习和视觉语言多模态提示学习方法.最后,我们对比分析了自然语言处理领域和视觉相关领域提示学习研究的进展,并对未来研究路线给出了展望。With the rapid development of deep learning models and the increasing parameter size,fine-tuning the entire model in various downstream applications with different objectives is prohibitive.To solve this significant issue,prompt learning has been primarily proposed in the field of natural language processing(NLP),and has been widely studied in recent years.By reformulating various downstream tasks as the same form of the pre-training one,prompt learning successfully leverages large-scale pre-trained language models in various downstream applications with great efficiency from both the parameter and data perspectives.Among them,models pre-trained by masked language modeling(MLM)represented by BERT have achieved great success in tasks requiring word-level output such as text classification,named entity recognition by"cloze prompt";models pre-trained via autoregressive/casual language modeling(A/CLM)such as GPT have been widely applied in tasks requiring text-level output using"prefix prompt",the tasks include dialogue generation,question answering,summarization,etc.Witnessing the success of prompt learning in NLP area,language models have also been applied in multimodal vision-language understanding problems through prompt learning.However,they still could not solve dense tasks in vision-related area.In addition,the expensive and complex process of fine-tuning the entire vision model in practical applications also occurs in vision-related area.Inspired by the great success of prompt learning in NLP,it has been gradually applied to various vision-related tasks,including image classification,object detection,image segmentation,domain adaptation,continual learning,etc.Seeing the lack of a comprehensive survey of prompt learning in vision area,therefore,this paper aims at conducting a comprehensive introduction and analysis on the prompt learning methods in unimodal vision area and multimodal vision-language area.First,we briefly introduce the pre-training models,the basic concepts of prompt learning,the forms of downs
关 键 词:大规模预训练模型 自然语言处理 视觉单模态提示学习 视觉语言多模态提示学习
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...