出 处:《中国图象图形学报》2023年第4期935-962,共28页Journal of Image and Graphics
基 金:国家自然科学基金重点项目(61932022)。
摘 要:计算机视觉的任务目标是建立接近人类视觉系统的计算模型。随着深度神经网络(deep neural network,DNN)的发展,对计算机视觉中高层语义的分析与理解成为研究重点。计算机视觉的高层语义通常为人类可理解、可表述的用于表达图像、视频等媒体信号内容的描述子(descriptor),典型的高层语义分析任务包含图像分类、目标检测、实例分割、语义分割与视频场景识别、目标跟踪等。基于深度神经网络的算法使计算机视觉任务获得逐步提升的性能,但是网络模型的体量增大与计算效率的降低随之而来。模型蒸馏是一种基于迁移学习进行模型压缩的方案。此类方案通常利用一个预训练模型作为教师,提取其有效的表示,如模型输出、隐藏层特征或特征间相似度等,并将上述表示作为另一个规模较小、推断速度较快的学生模型的额外监督信号,对该学生模型进行训练,以达到提升小模型性能从而取代大模型的目的。模型蒸馏对模型性能与计算复杂度有着良好权衡,因此愈来愈多地用于基于深度学习的高层语义分析中。自2014年模型蒸馏概念提出以来,研究人员开发了大量应用于高层语义分析的模型蒸馏方法,在图像分类、目标检测与语义分割任务中的应用最为广泛。本文对上述典型任务中具有代表性的模型蒸馏方案进行调研和汇总,依照不同的视觉任务进行介绍。首先,从最成熟、应用最广泛的分类任务模型蒸馏方法开始,介绍其不同的设计思路与应用场景,展示部分实验性能的对比,指出在分类任务上与在检测、分割任务上应用模型蒸馏的条件差异性。接着,对几种经特殊设计而应用于目标检测、语义分割的典型模型蒸馏方法进行介绍,结合模型结构对设计目的与思路进行说明,提供部分实验结果的对比与分析。最后,对当前高层语义分析中模型蒸馏方法的现状进行�Computer vision tasks aim to construct computational models in relevant to functions-like of human visual sys⁃tems.Current deep learning models are progressively improving upper bounds of performances in multiple computer vision tasks,especially for analysis and understanding of high-level semantics,i.e.,multimedia-based descriptors for human recognition.Typical tasks to understand high-level semantics include image classification,object detection,instance seg⁃mentation,semantic segmentation,and video’s recognition and tracking.With the development of convolutional neural networks(CNNs),deep learning based high-level semantic understanding have all been benefiting from increasingly deeper and cumbersome models,which is also challenged for the problem of storages and computational costs.To obtain lighter structure and computation efficiency,many model compression strategies have been proposed,e.g.,pruning,weight quantization,and low-rank factorization.But,such challenging issue is to be resolved for altered network structure or drop-severe of performance when deployed on computer vision tasks.Model distillation can be as one of the typical com⁃pression methods in terms of transfer learning to model compression.In general,model distillation utilizes a large and com⁃plicated pre-trained model as“teacher”and takes its effective representations,e.g.,model outputs,features of hidden layers or feature maps-between similarities.These representations are treated as extra supervision signal together with the original ground truth for a lighter and faster model’s training,in which the lighter model is called“student”.As model dis⁃tillation provides favorable balance between models’performances and efficiency,it is being rapidly explored on different computer vision tasks.This paper investigates the progress of model distillation methods since its introduction in 2014 and introduces their different strategies in various applications.We review some popular distillation strategies and current model dist
关 键 词:模型蒸馏 深度学习 图像分类 目标检测 语义分割 迁移学习
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...