基于微调的深度学习后门防御研究  

Research on Deep Learning Backdoor Defense Based on Fine-Tuning

作  者:余城旭 张宇来 YU Chengxu;ZHANG Yulai(School of Information and Electronic Engineering,Zhejiang University of Science and Technology,Hangzhou 310023,China)

机构地区:[1]浙江科技学院信息与电子工程学院,杭州310023

出  处:《计算机工程与应用》2025年第5期155-164,共10页Computer Engineering and Applications

基  金:国家自然科学基金青年项目(61803337)。

摘  要:目前的后门防御方法并未对微调的作用进行详细的分析。为解释微调的效果,从深度学习中的灾难性遗忘的角度进行了说明,并从该角度出发重新分析了现有的基于微调的后门防御工作的优缺点。在此基础上,提出了通过遗忘学习以分离出后门任务,并在微调阶段对后门任务进行删除的方法,称作后门删除,以改进微调的后门防御效果。同时还提出了一种简化后的方法,仅需在微调阶段基于模型参数进行操作,称作后门抑制。提出的简化方法可以与大多防御方法相结合,从而进一步提高它们的后门防御效果。提出的两种方法解决了之前的防御方法在后门任务遗忘上的不足。实验结果证明,提出的后门删除方法能够达到最先进的后门防御效果,且提出的后门抑制方法能够更好地提高其他方法的防御效果。Backdoor defense methods currently available do not provide a detailed analysis of the effect of fine-tuning.To explain the effects of fine-tuning,it is illustrated from the perspective of catastrophic forgetting in deep learning,and the advantages and disadvantages of existing fine-tuning-based backdoor defense efforts are reanalyzed from that perspective.Based on this,a method called backdoor deletion is proposed to improve the backdoor defense effect of fine-tuning by forgetting learning in order to separate out the backdoor task and delete the backdoor task in the fine-tuning stage.A simplified method called backdoor inhibition is also proposed that only requires operations based on model parameters in the fine-tuning phase.The proposed simplified method can be combined with most defense methods to further improve their backdoor defense effectiveness.The two proposed methods address the shortcomings of previous defense methods in backdoor task forgetting.Experimental results demonstrate that the proposed backdoor removal method can achieve the state-of-the-art backdoor defense effect,and the proposed backdoor inhibition method can better improve the defense effect of other methods.

关 键 词:深度学习 后门攻击 后门防御 人工智能安全 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象