检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张燚钧 张润清 周华健 齐骥 余肇飞 黄铁军[2] Zhang Yijun;Zhang Runqing;Zhou Huajian;Qi Ji;Yu Zhaofei;Huang Tiejun(Platform Product Department,China Mobile(Suzhou)Software Technology Co.,Ltd.,Suzhou 215000,China;School of Computer Science,Peking University,Beijing 100190,China)
机构地区:[1]中移(苏州)软件技术有限公司平台产品部,苏州215000 [2]北京大学计算机学院,北京100190
出 处:《中国图象图形学报》2025年第1期1-24,共24页Journal of Image and Graphics
基 金:国家自然科学基金项目(62088102)。
摘 要:在计算机视觉领域,尽管传统的深度学习视觉模型在特定任务上表现出色,但它们对大量标注数据的高度依赖及在新场景下性能泛化的局限性,大大增加了使用成本并限制了模型的应用范围。近年来,以Transformer为核心的新型模型结构,特别是在自监督学习领域的应用,为解决这些挑战提供了新的解决方案。这些模型通常通过大规模数据预训练,在处理复杂视觉场景中展现出强大的泛化能力,其被广泛称为视觉基础模型。本文深入探讨了视觉基础模型的研究现状与未来发展趋势,并重点关注该领域的关键技术进展及其对未来计算机视觉的潜在影响。首先回顾和梳理了视觉基础模型的背景与发展历程,然后介绍了在这一发展历程中出现的关键模型基础结构,介绍并分析了构建视觉基础模型所采用的各类预训练任务的设计思路,并根据其特性对现有的视觉基础模型进行分类。同时,对不同类型视觉基础模型中的代表性工作进行了介绍,并整理了目前可用于视觉基础模型预训练的数据集。最后,对视觉基础模型的研究现状进行总结和思考,提出了目前存在的一些挑战,并展望未来可能的研究方向。In the field of computer vision,traditional deep learning vision models have exhibited remarkable performance on specific tasks.However,their substantial dependency on large amounts of annotated data and limited capability in gen⁃eralization across new scenes significantly elevate usage costs and restrict the application scope of these models.Recently,novel model architectures centered around the Transformer,particularly in the domain of self-supervised learning,have emerged as solutions to these challenges.These models,typically pre-trained on extensive datasets,demonstrate robust generalization capabilities in complex visual scenarios and are widely recognized as vision foundation models.This paper delves into the current research status and future trends of vision foundation models,with a focus on key technological advancements in this field and their potential impact on future developments in computer vision.The paper begins by reviewing and organizing the background and developmental history of vision foundation models,followed by an introduc⁃tion to the key model structures that have emerged in this developmental trajectory.The article further introduces and ana⁃lyzes the design philosophies of various pre-training tasks employed in constructing vision foundation models,categorizing the existing models based on their characteristics.Additionally,the paper presents representative works in different types of vision foundation models and compiles the currently available datasets for pre-training these models.Finally,the paper summarizes the current research status of vision foundation models,reflects on existing challenges,and anticipates poten⁃tial future research directions.This paper offers an expansive examination of the landscape of visual foundation models,chronicling their evolution,current achievements,and charting a course for future research.It acknowledges the Transfor⁃mative impact of deep learning on computer vision,shifting the paradigm from traditional computational methods to models t
关 键 词:基础模型 计算机视觉(CV) 预训练模型 自监督学习 多任务学习
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.60.86