文本后门攻击与防御综述  被引量:1

Survey of Textual Backdoor Attack and Defense

在线阅读下载全文

作  者:郑明钰 林政[1,2] 刘正宵 付鹏 王伟平[1] Zheng Mingyu;Lin Zheng;Liu Zhengxiao;Fu Peng;Wang Weiping(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049)

机构地区:[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京100049

出  处:《计算机研究与发展》2024年第1期221-242,共22页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61976207,61906187)。

摘  要:深度神经网络的安全性和鲁棒性是深度学习领域的研究热点.以往工作主要从对抗攻击角度揭示神经网络的脆弱性,即通过构建对抗样本来破坏模型性能并探究如何进行防御.但随着预训练模型的广泛应用,出现了一种针对神经网络尤其是预训练模型的新型攻击方式——后门攻击.后门攻击向神经网络注入隐藏的后门,使其在处理包含触发器(攻击者预先定义的图案或文本等)的带毒样本时会产生攻击者指定的输出.目前文本领域已有大量对抗攻击与防御的研究,但对后门攻击与防御的研究尚不充分,缺乏系统性的综述.全面介绍文本领域后门攻击和防御技术.首先,介绍文本领域后门攻击基本流程,并从不同角度对文本领域后门攻击和防御方法进行分类,介绍代表性工作并分析其优缺点;之后,列举常用数据集以及评价指标,将后门攻击与对抗攻击、数据投毒2种相关安全威胁进行比较;最后,讨论文本领域后门攻击和防御面临的挑战,展望该新兴领域的未来研究方向.In the deep learning community,lots of efforts have been made to enhance the robustness and the reliability of deep neural networks(DNNs).Previous research mainly analyzed the fragility of DNN from the perspective of adversarial attack,and researchers designed numerous adversarial attack and defense methods.However,with the wide application of pre-trained models(PTMs),a new security threat against DNN especially PTM,called backdoor attack is emerging.Backdoor attack aims at injecting hidden backdoors into DNN,such that the backdoored model behaves properly on normal inputs but produces attacker-specified malicious outputs on the poisoned inputs embedded with special triggers.Backdoor attack poses a severe threat against DNN based systems like spam filter or hate speech detector.Compared with the textual adversarial attack and defense which has been widely studied,textual backdoor attack and defense has not been thoroughly investigated and requires a systematic review.In this paper,we present a comprehensive survey of backdoor attack and defense methods in the text domain.Specifically,we first summarize and categorize the textual backdoor attack and defense methods from different perspectives,then we introduce typical work and analyze their pros and cons.We also enumerate widely adopted benchmark datasets and evaluation metrics in the current literatures.Moreover,we respectively compare the backdoor attack with two relevant threats(i.e.,adversarial attack and data poisoning).Finally,we discuss existing challenges of backdoor attack and defense in the text domain and present several promising future directions in this emerging and rapidly growing research area.

关 键 词:后门攻击 后门防御 自然语言处理 预训练模型 AI安全 

分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象