PPDM中面向k-匿名的MI Loss评估模型

MI Loss Evaluation Model for k-Anonymity in PPDM

作　　者：谷青竹董红斌[1] GU Qingzhu;DONG Hongbin(School of Cyber Science and Engineering,Wuhan University,Wuhan 430000,China)

机构地区：[1]武汉大学国家网络安全学院,武汉430000

出　　处：《计算机工程》2022年第4期143-147,共5页Computer Engineering

基　　金：国家自然科学基金“计算机免疫智能的连续免疫应答机制及其应用研究”(61877045)。

摘　　要：隐私保护数据挖掘(PPDM)利用匿名化等方法使数据所有者在不泄露隐私信息的前提下,安全发布在数据挖掘中有效可用的数据集。k-匿名算法作为PPDM研究使用最广泛的算法之一,具有计算开销低、数据形变小、能抵御链接攻击等优点,但是在一些k-匿名算法研究中使用的数据可用性评估模型的权重设置不合理,导致算法选择的最优匿名数据集在后续的分类问题中分类准确率较低。提出一种使用互信息计算权重的互信息损失(MI Loss)评估模型。互信息反映变量间的关联关系,MI Loss评估模型根据准标识符和标签之间的互信息计算权重,并通过Loss公式得到各个准标识符的信息损失,将加权后的准标识符信息损失的和作为数据集的信息损失,以弥补评估模型的缺陷。实验结果证明,运用MI Loss评估模型指导k-匿名算法能够明显降低匿名数据集在后续分类中的可用性丢失,相较于Loss模型和Entropy Loss模型,该模型分类准确率提升了0.73%~3.00%。Privacy Preserving Data Mining(PPDM)uses methods such as anonymization to allow data owners to safely publish data sets that are effectively available in data mining without revealing private information.The kanonymity algorithm,one of the most widely used algorithms in PPDM research,has the advantages of low computational overhead,small data deformation,and resistance to link attacks.However,in some studies on k-anonymity algorithms,the weight settings of the data utility evaluation model used by the algorithm are unreasonable,which leads to the low classification accuracy of the optimal anonymous data set selected by the algorithm.Mutual Information(MI)reflects the relationship between variables.The MI Loss evaluation model uses the mutual information between the quasiidentifier and the label to calculate the weight.The information loss of each quasi-identifier is obtained through the Loss formula,and the sum of all weighted quasi-identifier information losses is taken as the information loss of the data set,which makes up for the shortcomings of the existing evaluation model.Experiments show that using the MI Loss evaluation model to guide the k-anonymity algorithm can significantly reduce the utility loss of anonymous data sets in subsequent classification problems.The classification accuracy of the proposed model shows an improvement of 0.73%~3.00%compared with the accuracies of the Loss and Entropy Loss models.

关键词：隐私保护数据挖掘 k-匿名算法数据可用性分类准确率 MI Loss评估模型

分类号：TP309[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PPDM中面向k-匿名的MI Loss评估模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PPDM中面向k-匿名的MI Loss评估模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索