基于暗知识保护的模型窃取防御技术DKP  被引量:1

DKP:defending against model stealing attacks based on dark knowledge protection

在线阅读下载全文

作  者:张郅 李欣[1] 叶乃夫 胡凯茜 ZHANG Zhi;LI Xin;YE Naifu;HU Kaixi(Academy of Information Network Security,People's Public Security University of China,Beijing 100038,China)

机构地区:[1]中国人民公安大学信息网络安全学院,北京100038

出  处:《计算机应用》2024年第7期2080-2086,共7页journal of Computer Applications

基  金:国家重点研发计划项目(2020 AAA0107705)。

摘  要:在黑盒场景下,使用模型功能窃取方法生成盗版模型已经对云端模型的安全性和知识产权保护构成严重威胁。针对扰动和软化标签(变温)等现有的模型窃取防御技术可能导致模型输出中置信度最大值的类别发生改变,进而影响原始任务中模型性能的问题,提出一种基于暗知识保护的模型功能窃取防御方法,称为DKP(defending against model stealing attacks based on Dark Knowledge Protection)。首先,利用待保护的云端模型对测试样本进行处理,以获得样本的初始置信度分布向量;然后,在模型输出层之后添加暗知识保护层,通过分区变温调节softmax机制对初始置信度分布向量进行扰动处理;最后,得到经过防御的置信度分布向量,从而降低模型信息泄露的风险。使用所提方法在4个公开数据集上取得了显著的防御效果,尤其在博客数据集上使盗版模型的准确率降低了17.4个百分点,相比之下对后验概率进行噪声扰动的方法仅能降低约2个百分点。实验结果表明,所提方法解决了现有扰动、软化标签等主动防御方法存在的问题,在不影响测试样本分类结果的前提下,通过扰动云端模型输出的类别概率分布特征,成功降低了盗版模型的准确率,实现了对云端模型机密性的可靠保障。In black-box scenarios,using model function stealing methods to generate piracy models has posed a serious threat to the security and intellectual property protection of models in the cloud.To solve the problem of existing model stealing defense techniques,such as perturbation and softening labels(variable temperature),may cause the category with the maximum confidence value in the model output to change,thereby affecting the performance of the model in the original task,a model stealing defense method based on dark knowledge protection was proposed which was called DKP(defending against model stealing attacks based on Dark Knowledge Protection).First,the cloud model to be protected was used to process the test samples,obtaining its initial confidence distribution vector.Then,a dark knowledge protection layer was added after model output layer,and the initial confidence distribution vector was perturbed through the partitioned temperature-regulated softmax mechanism.Finally,the defended confidence distribution vector was obtained,thus reducing the risk of model information leakage.The proposed method achieved significant defensive effects on four public datasets;especially on the blog dataset,the accuracy of the piracy model was reduced by 17.4 percentage points,while the method of noise perturbation of the posterior probability only reduced the accuracy of the piracy model by about 2 percentage points.The experimental results show that the proposed method solves the problems of existing active defense methods such as perturbation and softening labels,successfully reduces the accuracy of the piracy model by perturbing the category probability distribution features of the cloud model output without affecting the classification results,and achieves a reliable guarantee of the confidentiality of cloud model.

关 键 词:深度学习 黑盒场景 云端模型 模型功能窃取 模型窃取防御 暗知识保护 

分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象