面向目标类别分类的无数据知识蒸馏方法  

Data-free knowledge distillation for target class classification

在线阅读下载全文

作  者:谢奕涛 苏鹭梅 杨帆[2] 陈宇涵 Xie Yitao;Su Lumei;Yang Fan;Chen Yuhan(School of Electrical Engineering and Automation,Xiamen University of Technology,Xiamen 361024,China;Department of Automation,Xiamen University,Xiamen 361102,China)

机构地区:[1]厦门理工学院电气工程与自动化学院,厦门361024 [2]厦门大学自动化系,厦门361102

出  处:《中国图象图形学报》2024年第11期3401-3416,共16页Journal of Image and Graphics

基  金:国家自然科学基金项目(62173282);福建省自然科学基金项目(2022J011255);厦门市自然科学基金项目(3502Z20227180)。

摘  要:目的目前,研究者们大多采用无数据蒸馏方法解决训练数据缺乏的问题。然而,现有的无数据蒸馏方法在实际应用场景中面临着模型收敛困难和学生模型紧凑性不足的问题,为了满足针对部分类别的模型训练需求,灵活选择教师网络目标类别知识,本文提出了一种新的无数据知识蒸馏方法:面向目标类别的掩码蒸馏(masked distil⁃lation for target classes,MDTC)。方法MDTC在生成器学习原始数据的批归一化参数分布的基础上,通过掩码阻断生成网络在梯度更新过程中非目标类别的梯度回传,训练一个仅生成目标类别样本的生成器,从而实现对教师模型中特定知识的准确提取;此外,MDTC将教师模型引入到生成网络中间层的特征学习过程,优化生成器的初始参数设置和参数更新策略,加速模型收敛。结果在4个标准图像分类数据集上,设计13个子分类任务,评估MDTC在不同难度的子分类任务上的性能表现。实验结果表明,MDTC能准确高效地提取教师模型中的特定知识,不仅总体准确率优于主流的无数据蒸馏模型,而且训练耗时少。其中,40%以上学生模型的准确率甚至超过教师模型,最高提升了3.6%。结论本文方法的总体性能超越了现有无数据蒸馏模型,尤其是在简单样本分类任务的知识学习效率非常高,在提取知识类别占比较低的情况下,模型性能最优。Objective Knowledge distillation is a simple and effective method for compressing neural networks and has become a popular topic in model compression research.This method features a“teacher–student”architecture where a large network guides the training of a small network to improve its performance in application scenarios,indirectly achieving network compression.In traditional methods,the training of the student model relies on the training data of the teacher,and the quality of the student model depends on the quality of the training data.When faced with data scarcity,these methods fail to produce satisfactory results.Data-free knowledge distillation successfully addresses the issue of limited training data by introducing synthetic data.Such methods mainly synthesize training data by refining teacher network knowledge.For example,they use the intermediate representations of the teacher network for image inversion synthesis or employ the teacher network as a fixed discriminator to supervise the generator of synthetic images for training the student network.Compared with traditional methods,the training of data-free knowledge distillation does not rely on the original training data of the teacher network,which markedly expands the application scope of knowledge distillation.However,the training process may have a certain efficiency discount compared with traditional methods due to the need for additional synthetic training data.Furthermore,in practical applications,focus is often only provided on a few target classes.However,existing data-free knowledge distillation methods encounter difficulties in selectively learning the knowledge of the target class,especially when the number of teacher model classes is large,the model convergence is complex,and achieving sufficient compactness through the student model is difficult.Therefore,this paper proposes a novel data-free knowledge distillation method,namely masked distillation for target classes(MDTC).This method allows the student model to selectively learn the

关 键 词:深度学习 图像分类 模型压缩 无数据知识蒸馏 生成器 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象