CheatKD:基于毒性神经元同化的知识蒸馏后门攻击方法  

CheatKD:Knowledge Distillation Backdoor Attack Method Based on Poisoned Neuronal Assimilation

在线阅读下载全文

作  者:陈晋音[1,2] 李潇 金海波 陈若曦 郑海斌 李虎[3] CHEN Jinyin;LI Xiao;JIN Haibo;CHEN Ruoxi;ZHENG Haibin;LI Hu(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China;Institute of Cyberspace Security,Zhejiang University of Technology,Hangzhou 310023,China;Chinese National Key Laboratory of Science and Technology on Information System Security,Beijing 100101,China)

机构地区:[1]浙江工业大学信息工程学院,杭州310023 [2]浙江工业大学网络空间安全研究院,杭州310023 [3]信息系统安全技术重点实验室,北京100101

出  处:《计算机科学》2024年第3期351-359,共9页Computer Science

基  金:国家自然科学基金(62072406);浙江省自然科学基金(DQ23F020001);信息系统安全技术重点实验室基金(61421110502)。

摘  要:深度学习模型性能不断提升,但参数规模也越来越大,阻碍了其在边缘端设备的部署应用。为了解决这一问题,研究者提出了知识蒸馏(Knowledge Distillation,KD)技术,通过转移大型教师模型的“暗知识”快速生成高性能的小型学生模型,从而实现边缘端设备的轻量部署。然而,在实际场景中,许多教师模型是从公共平台下载的,缺乏必要的安全性审查,对知识蒸馏任务造成威胁。为此,我们首次提出针对特征KD的后门攻击方法CheatKD,其嵌入在教师模型中的后门,可以在KD过程中保留并转移至学生模型中,进而间接地使学生模型中毒。具体地,在训练教师模型的过程中,CheatKD初始化一个随机的触发器,并对其进行迭代优化,以控制教师模型中特定蒸馏层的部分神经元(即毒性神经元)的激活值,使其激活值趋于定值,以此实现毒性神经元同化操作,最终使教师模型中毒并携带后门。同时,该后门可以抵御知识蒸馏的过滤被传递到学生模型中。在4个数据集和6个模型组合的实验上,CheatKD取得了85%以上的平均攻击成功率,且对于多种蒸馏方法都具有较好的攻击泛用性。With the continuous performance improvement of deep neural networks(DNNs),their parameter scale is also growing sharply,which hinders the deployment and application of DNNs on edge devices.To solve this problem,researchers propose knowledge distillation(KD).Small student models with high performance can be generated from KD,by learning the“dark knowledge”of large teacher models,realizing easy deployment of DNNs on edge devices.However,in the actual scenario,users often download large models from public model repositories,which lacks the guarantee of security.This may pose a severe threat to KD tasks.This paper proposes a backdoor attack for feature KD,named CheatKD,whose backdoor,embedded in the teacher model,can be retained and transferred to the student model during KD,and then indirectly poison the student model.Specifically,in the process of training the teacher model,CheatKD initializes a random trigger and optimizes it to control the activation values of some certain neurons of a particular distillation layer in the teacher model(i.e.,poisoned neuron),making their activation va-lues fixed to enable poisoned neuronal assimilation.As the result,the teacher model is backdoored while this backdoor can resist to KD filtration and be transferred to the student model.Extensive experiment on four datasets and six model pairs have verified that CheatKD achieves an average attack success rate of 85.7%.Besides,it has good generality for various distillation methods.

关 键 词:后门攻击 深度学习 知识蒸馏 鲁棒性 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象