面向中文文本分类的字符级对抗样本生成方法  被引量:3

Character-level Adversarial Samples Generation Approach for Chinese Text Classification

在线阅读下载全文

作  者:张顺香[1,2] 吴厚月 朱广丽 许鑫[1,2] 苏明星 ZHANG Shunxiang;WU Houyue;ZHU Guangli;Xu Xin;SU Mingxing(School of Computer Science and Engineering,Anhui University of Science&Technology,Huainan 232001,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,淮南232001 [2]合肥综合性国家科学中心人工智能研究院,合肥230088

出  处:《电子与信息学报》2023年第6期2226-2235,共10页Journal of Electronics & Information Technology

基  金:国家自然科学基金(62076006);安徽高校协同创新项目(GXXT-2021-008);安徽省研究生科研项目(YJS20210402)。

摘  要:对抗样本生成是一种通过添加较小扰动信息,使得神经网络产生误判的技术,可用于检测文本分类模型的鲁棒性。目前,中文领域对抗样本生成方法主要有繁体字和同音字替换等,这些方法都存在对抗样本扰动幅度大,生成对抗样本质量不高的问题。针对这些问题,该文提出一种字符级对抗样本生成方法(PGAS),通过对多音字进行替换可以在较小扰动下生成高质量的对抗样本。首先,构建多音字字典,对多音字进行标注;然后对输入文本进行多音字替换;最后在黑盒模式下进行对抗样本攻击实验。实验在多种情感分类数据集上,针对多种最新的分类模型验证了该方法的有效性。Adversarial sample generation is a technique that makes the neural network produce misjudgments by adding small disturbance information.Which can be used to detect the robustness of text classification models.At present,the methods of sample generation in the Chinese domain mainly include traditional characters and homophones substitution,which have the problems of large disturbance amplitude of sample generation and low quality of sample generation.Polyphonic characters Generation Adversarial Sample(PGAS),a character-level countermeasure samples generation approach,is proposed in this paper.Which can generate high-quality adversarial samples with minor disturbance by replacing polyphonic characters.First,a polyphonic word dictionary to label polyphonic words is constructed.Then,the input text with polyphonic words is replaced.Finally,an adversarial sample attack experiment in the black-box model is conducted.Experiments on multiple sentiment classification datasets verify the effectiveness of the proposed method for a variety of the latest classification models.

关 键 词:对抗样本生成 文本分类 情感分类 多音字 字符级对抗样本 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TN915.08[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象