基于文本关键词的对抗样本生成技术研究  被引量:2

Research on Adversarial Examples Generation Technology Based on Text Keywords

在线阅读下载全文

作  者:王志强[1,2] 都迎迎 林雨衡 陈旭东 Wang Zhiqiang;Du Yingying;Lin Yuheng;Chen Xudong(Beijing Electronic Science and Technology Institute,Beijing 100070;State Information Center,Beijing 100045)

机构地区:[1]北京电子科技学院,北京100070 [2]国家信息中心,北京100045

出  处:《信息安全研究》2023年第4期338-346,共9页Journal of Information Security Research

基  金:中国博士后科学基金面上项目(2019M650606);信息网络安全公安部重点实验室项目(C9614);广东省信息安全技术重点实验室开放课题基金项目(2020B1212060078-12);北京电子科技学院一流学科建设项目(3201012)。

摘  要:深度学习模型已被广泛应用于处理自然语言任务,但最新研究表明对抗攻击会严重降低分类模型的准确率,使模型分类功能失效.针对深度学习模型处理自然语言任务时出现的脆弱性问题,提出一种新的对抗样本生成方法KeywordsAttack.该方法利用统计算法选择部分字词组成文本关键词集合,再根据关键词对模型分类结果贡献度大小进行迭代替换,直到成功误导分类模型或替换次数达到设定阈值.该方法针对中文的特点采用汉字拆分、拼音替换的方式生成对抗样本.最后,采用公开酒店购物评论数据集进行实验.实验结果表明,利用KeywordsAttack方法生成的对抗样本平均修改幅度占原始文本的18.2%,攻击BERT模型分类准确率约降低43%,攻击LSTM模型分类准确率约降低30%.该数据表明KeywordsAttack方法可以通过对文本进行较小的扰动成功误导分类模型,同时生成对抗样本过程中访问模型次数较少.Deep learning models have been widely used to deal with natural language tasks,but the latest research shows that adversarial attacks will seriously reduce the accuracy of the classification model and make the model classification function ineffective.Aiming at the vulnerability of deep learning models when dealing with natural language tasks,a new adversarial examples generation method,KeywordsAttack,is proposed,The method uses a statistical algorithm to select some words to form a text keyword set,And then it iteratively replaces the keywords according to the contribution of the model classification results until the classification model is successfully misled or the number of replacements reaches the set value,According to the characteristics of Chinese,this method generates adversarial examples by splitting Chinese characters and replacing pinyin.Finally,using the public hotel shopping review dataset to conduct experiments,the results show that the average modification magnitude of adversarial examples accounts for 18.2%of the original text and the classification accuracy of attacking the BERT model is reduced by about 43%,and the classification accuracy of attacking the LSTM model is reduced by about 30%.These data show that the KeywordsAttack method can successfully mislead the classification model by making small perturbations to the text.At the same time,the number of query models in the process of generating adversarial examples is small.

关 键 词:对抗样本 中文文本 神经网络 黑盒攻击 深度学习 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象