特征图自适应知识蒸馏模型

Activation Map Adaptation Model for Knowledge Distillation

作　　者：吴致远齐红[1,3] 姜宇[1,3] 崔楚朋杨宗敏薛欣慧 WU Zhiyuan;QI Hong;JIANG Yu;CUI Chupeng;YANG Zongmin;XUE Xinhui(College of Computer Science and Technology,Jilin University,Changchun 130012,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China)

机构地区：[1]吉林大学计算机科学与技术学院,长春130012 [2]中国科学院计算技术研究所,北京100190 [3]吉林大学符号计算与知识工程教育部重点实验室,长春130012

出　　处：《吉林大学学报（理学版）》2022年第4期881-888,共8页Journal of Jilin University:Science Edition

基　　金：国家自然科学基金(批准号:U20A20285,62072211,51939003).

摘　　要：针对嵌入式和移动设备的计算和存储资源受限,紧凑型网络优化易收敛至较差局部最优解的问题,提出一个特征图自适应知识蒸馏模型,其由特征图适配器和特征图自适应知识蒸馏策略构成.首先,特征图适配器通过异构卷积与视觉特征表达模块的堆叠实现特征图尺寸匹配、教师学生网络特征同步变换及自适应语义信息匹配.其次,特征图自适应知识蒸馏策略将适配器嵌入教师网络对其进行重构,并在训练过程中实现适合用于学生网络隐藏层监督特征的自适应搜索;利用适配器前部输出提示学生网络前部训练,实现教师到学生网络的知识迁移,并在学习率约束条件下进一步优化.最后,在图像分类任务数据集cifar-10上进行实验验证,结果表明,特征图自适应知识蒸馏模型分类正确率提高0.6%,推断损失降低6.5%,并将收敛至78.2%正确率的时间减少至未迁移时的5.6%.Aiming at the problem that computational and storage resources of embedded and mobile devices were limited,and the compact network optimization was easy to converge to poor local optimal solutions,we proposed an activation map adaptation model for knowledge distillation,which was composed of an activation map adapter and an activation map adaptation knowledge distillation strategy.Firstly,the activation map adapter realized activation map size matching,synchronous transformation of teacher-student network features,and adaptive semantic information matching by heterogeneous convolution and stacking of visual feature expression modules.Secondly,the activation map adaptation knowledge distillation strategy embedded the adapter into the teacher network to reconstruct it,and realized adaptively search suitable for the supervision features of the hidden layer of the student network during training process,the front output of the adapter was used to prompt the front training of the student network,so as to realize knowledge transfer from the teacher to the student network,and further optimize it under the constraint of learning rate.Finally,experimental verification was carried out on the image classification task dataset cifar-10.The results show that the classification accuracy of the activation map adaptive knowledge distillation model is improved by 0.6%,the inference loss is reduced by 6.5%,and the time to converge to 78.2%accuracy is reduced to 5.6%when it is not migrated.

关键词：人工智能知识蒸馏特征图自适应模型迁移图像分类

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

特征图自适应知识蒸馏模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

特征图自适应知识蒸馏模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索