知识蒸馏研究综述被引量：53

Knowledge Distillation:A Survey

作　　者：黄震华杨顺志[1] 林威倪娟孙圣力陈运文[5] 汤庸 HUANG Zhen-Hua;YANG Shun-Zhi;LIN Wei;NI Juan;SUN Sheng-Li;CHEN Yun-Wen;TANG Yong(School of Computer Science,South China Normal University,Guangzhou 510631;School of Electronic and Information Engineering,Tongji University,Shanghai 201804;School of Philosophy and Social Development,South China Normal University,Guangzhou 510631;School of Software&Microelectronics,Peking University,Beijing 102600;Research and Development Department,DataGrand Inc,Shenzhen,Guangdong 518063)

机构地区：[1]华南师范大学计算机学院,广州510631 [2]同济大学电子与信息工程学院,上海201804 [3]华南师范大学哲学与社会发展学院,广州510631 [4]北京大学软件与微电子学院,北京102600 [5]达而观智能(深圳)有限公司研发部,广东深圳518063

出　　处：《计算机学报》2022年第3期624-653,共30页Chinese Journal of Computers

基　　金：国家自然科学基金(61772366,U1811263,61972328);上海市自然科学基金(17ZR1445900);广东省科技计划项目(2019B090905005)资助

摘　　要：高性能的深度学习网络通常是计算型和参数密集型的,难以应用于资源受限的边缘设备.为了能够在低资源设备上运行深度学习模型,需要研发高效的小规模网络.知识蒸馏是获取高效小规模网络的一种新兴方法,其主要思想是将学习能力强的复杂教师模型中的“知识”迁移到简单的学生模型中.同时,它通过神经网络的互学习、自学习等优化策略和无标签、跨模态等数据资源对模型的性能增强也具有显著的效果.基于在模型压缩和模型增强上的优越特性,知识蒸馏已成为深度学习领域的一个研究热点和重点.本文从基础知识,理论方法和应用等方面对近些年知识蒸馏的研究展开全面的调查,具体包含以下内容:(1)回顾了知识蒸馏的背景知识,包括它的由来和核心思想;(2)解释知识蒸馏的作用机制;(3)归纳知识蒸馏中知识的不同形式,分为输出特征知识、中间特征知识、关系特征知识和结构特征知识;(4)详细分析和对比了知识蒸馏的各种关键方法,包括知识合并、多教师学习、教师助理、跨模态蒸馏、相互蒸馏、终身蒸馏以及自蒸馏;(5)介绍知识蒸馏与其它技术融合的相关方法,包括生成对抗网络、神经架构搜索、强化学习、图卷积、其它压缩技术、自动编码器、集成学习以及联邦学习;(6)对知识蒸馏在多个不同领域下的应用场景进行了详细的阐述;(7)讨论了知识蒸馏存在的挑战和未来的研究方向.High-performance deep learning models are usually computationally and parameter-intensive,making it hard to deploy on edge devices with limited resources.In order to run deep learning models on low resource devices,efficient small-scale networks are needed.Knowledge distillation is a new method to obtain efficient small-scale networks.Its main idea is to transfer the“knowledge”from complex teacher networks with a strong learning ability to simple student networks.In knowledge distillation,a student model improves its generalization ability by imitating the“dark knowledge”of the corresponding teacher.At the same time,it can improve the performance of models by exploiting the optimization strategies such as mutual learning and self-learning of neural networks and the data resources such as unlabeled and cross-modal.Therefore,we can obtain an efficient and effective deep learning network model through knowledge distillation.Based on these predominant characteristics in model compression and enhancement,knowledge distillation has become a research hotspot and focus in the field of deep learning.Currently,there are some surveys on knowledge distillation.However,they lack more systematic studies to present global and comprehensive views on knowledge distillation.First of all,previous investigations have ignored the application prospects of knowledge distillation in model enhancement.Second,previous surveys did not pay attention to structure knowledge,which is indispensable in the knowledge structure of a network.In model enhancement and structural feature knowledge,the application prospects of knowledge distillation have become more and more important in improving the performance of student models in the past two years.In order to overcome the shortcomings of previous works,this paper gives a description based on knowledge distillation from different perspectives and provides more detailed knowledge introductions.Specifically,we conduct a more comprehensive investigation of knowledge distillation in recent year

关键词：知识蒸馏模型压缩模型增强知识迁移深度学习

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

知识蒸馏研究综述被引量：53

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

知识蒸馏研究综述 被引量：53

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

知识蒸馏研究综述被引量：53