基于带毒分类器的自监督后门攻击防御方法

Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier

作　　者：王一飞张胜杰薛迪展钱胜胜 WANG Yifei;ZHANG Shengjie;XUE Dizhan;QIAN Shengsheng(Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China;State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academ y of Sciences,Beijing 100190,China)

机构地区：[1]郑州大学河南先进技术研究院,郑州450000 [2]中国科学院自动化研究所多模态人工智能系统全国重点实验室,北京100190

出　　处：《计算机科学》2025年第4期336-342,共7页Computer Science

基　　金：北京市自然科学基金(JQ23018)。

摘　　要：近年来,自监督学习网络(Self-Supervised Learning,SSL)在深度学习领域迅速崛起,成为该领域发展的主要动力,特别是预训练图像模型和大规模语言模型(Large Language Model,LLM)的出现,引起了全球范围内的广泛关注。但是最近的研究发现,自监督学习网络容易受到后门攻击的影响。攻击者可以通过在训练数据集中加入少量带有恶意后门的样本,来操控预训练模型在下游任务中的表现。为了防御这种SSL后门攻击,提出了一种基于带毒分类器的自监督后门攻击防御方法,称为DPC(Defending by Poisoned Classifier)。通过获取在被污染数据集上训练的威胁模型,所提方法可以准确地检测出有毒样本。实验结果显示,假设屏蔽后门触发器可以有效地改变下游聚类模型的激活状态,DPC防御方法在实验中达到了91.5%的后门触发器检测召回率以及27.4%的精准率,超过了原来的SOTA方法。这表明该方法在检测潜在威胁方面具有出色的性能,为自监督学习网络的安全性提供了有效的保障。In recent years,the rapid ascension of Self-Supervised Learning(SSL)networks has become a pivotal force propelling advancements in the realm of deep learning.This surge in prominence is particularly evident with the introduction of pre-trained image models and large language models(LLM),capturing widespread attention on a global scale.However,amidst this progress,recent investigations have brought to light the susceptibility of self-supervised learning networks to backdoor attacks,posing a significant challenge to their robustness.The vulnerability arises from the potential manipulation of pre-trained models’perfor-mance on downstream tasks through the incorporation of a limited number of training samples carrying malicious backdoors into the training dataset.Recognizing the critical need to fortify against such SSL backdoor attacks,our response comes in the form of a novel defense mechanism known as defending by poisoned classifier(DPC),leveraging the capabilities of a poisoned classifier.DPC operates by training a threat model on a dataset intentionally contaminated with adversarial samples.This strategic approach enables our method to accurately identify and detect toxic samples,thereby establishing a formidable defense against potential threats embedded within the training data.The experimental outcomes are compelling,showcasing that assuming the blocking of the backdoor trigger can effectively modify the activation state of downstream clustering models,DPC defence achieves a 91.5%recall rate for backdoor trigger detection and a 27.4%precision rate in our experiments,outperforming the original SOTA me-thod.These results underscore the effectiveness of the proposed method is not only fortifying self-supervised learning networks against potential threats but also in elevating their overall security posture.By providing a robust defense mechanism,DPC contri-butes significantly to ensuring the integrity and reliability of self-supervised learning models in the face of evolving challenges in the dynamic landscap

关键词：自监督网络人工智能防御后门攻击图像分类

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于带毒分类器的自监督后门攻击防御方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于带毒分类器的自监督后门攻击防御方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索