多层级特征融合与双教师协作的知识蒸馏

Knowledge distillation of multi-level feature fusion and dual-teacher collaboration

作　　者：王硕余璐徐常胜[2] Wang Shuo;Yu Lu;Xu Changsheng(School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300382,China;National Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]天津理工大学计算机科学与工程学院,天津300382 [2]中国科学院自动化研究所多模态人工智能系统全国重点实验室,北京100190

出　　处：《中国图象图形学报》2024年第12期3770-3785,共16页Journal of Image and Graphics

基　　金：国家自然科学基金项目(62202331)。

摘　　要：目的知识蒸馏旨在不影响原始模型性能的前提下,将一个性能强大且参数量也较大的教师模型的知识迁移到一个轻量级的学生模型上。在图像分类领域,以往的蒸馏方法大多聚焦于全局信息的提取而忽略了局部信息的重要性。并且这些方法多是围绕单教师架构蒸馏,忽视了学生可以同时向多名教师学习的潜力。因此,提出了一种融合全局和局部特征的双教师协作知识蒸馏框架。方法首先随机初始化一个教师(临时教师)与学生处理全局信息进行同步训练,利用其临时的全局输出逐步帮助学生以最优路径接近教师的最终预测。同时又引入了一个预训练的教师(专家教师)处理局部信息。专家教师将局部特征输出分离为源类别知识和其他类别知识并分别转移给学生以提供较为全面的监督信息。结果在CIFAR-100(Canadian Institute for Advanced Research)和TinyImageNet数据集上进行实验并与其他蒸馏方法进行了比较。在CIFAR-100数据集中,与最近的NKD(normalized knowledge distillation)相比,在师生相同架构与不同架构下,平均分类准确率分别提高了0.63%和1.00%。在TinyImageNet数据集中,ResNet34(residual network)和MobileNetV1的师生组合下,分类准确率相较于SRRL(knowledge distillation via softmax regression representation learning)提高了1.09%,相较于NKD提高了1.06%。同时也在CIFAR-100数据集中进行了消融实验和可视化分析以验证所提方法的有效性。结论本文所提出的双教师协作知识蒸馏框架,融合了全局和局部特征,并将模型的输出响应分离为源类别知识和其他类别知识并分别转移给学生,使得学生模型的图像分类结果具有更高的准确率。Objective Knowledge distillation aims to transfer the knowledge of a teacher model with a powerful performance and a large number of parameters to a lightweight student model and improve its performance without affecting the performance of the original model.Previous research on knowledge distillation mostly focus on the direction of knowledge distillation from one teacher to one student and neglect the potential for students to learn from multiple teachers simultaneously.Multi-teacher distillation can help the student model synthesize the knowledge of each teacher model,thereby improving its expressive ability.A few studies have examined the distillation of teacher models across these different situations,and learning from multiple teachers at the same time can integrate additional useful knowledge and information and consequently improve student performance.In addition,most of the existing knowledge distillation methods only focus on the global information of the image and ignore the importance of spatial local information.In image classification,local information refers to the features and details of specific regions in the image,including textures,shapes,and boundaries,which play important roles in distinguishing various image categories.The teacher network can distinguish local regions based on these details and make accurate predictions for similar appearances in different categories,but the student network may fail to predict.To address these issues,this article proposes a knowledge distillation method based on global and local dual-teacher collaboration,which integrates global and local information and effectively improve the classification accuracy of the student model.Method The original input image is initially represented as global and local image views.The original image(global image view) is randomly cropped locally,and the ratio of the cropped area to the original image is specified within 40%~70% to obtain local input information(local image view).Afterward,a teacher(scratch teacher) is randomly i

关键词：知识蒸馏(KD) 图像分类轻量级模型协作蒸馏特征融合

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多层级特征融合与双教师协作的知识蒸馏

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多层级特征融合与双教师协作的知识蒸馏

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索