检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谷青竹 董红斌[1] GU Qingzhu;DONG Hongbin(School of Cyber Science and Engineering,Wuhan University,Wuhan 430000,China)
出 处:《计算机工程》2022年第4期143-147,共5页Computer Engineering
基 金:国家自然科学基金“计算机免疫智能的连续免疫应答机制及其应用研究”(61877045)。
摘 要:隐私保护数据挖掘(PPDM)利用匿名化等方法使数据所有者在不泄露隐私信息的前提下,安全发布在数据挖掘中有效可用的数据集。k-匿名算法作为PPDM研究使用最广泛的算法之一,具有计算开销低、数据形变小、能抵御链接攻击等优点,但是在一些k-匿名算法研究中使用的数据可用性评估模型的权重设置不合理,导致算法选择的最优匿名数据集在后续的分类问题中分类准确率较低。提出一种使用互信息计算权重的互信息损失(MI Loss)评估模型。互信息反映变量间的关联关系,MI Loss评估模型根据准标识符和标签之间的互信息计算权重,并通过Loss公式得到各个准标识符的信息损失,将加权后的准标识符信息损失的和作为数据集的信息损失,以弥补评估模型的缺陷。实验结果证明,运用MI Loss评估模型指导k-匿名算法能够明显降低匿名数据集在后续分类中的可用性丢失,相较于Loss模型和Entropy Loss模型,该模型分类准确率提升了0.73%~3.00%。Privacy Preserving Data Mining(PPDM)uses methods such as anonymization to allow data owners to safely publish data sets that are effectively available in data mining without revealing private information.The kanonymity algorithm,one of the most widely used algorithms in PPDM research,has the advantages of low computational overhead,small data deformation,and resistance to link attacks.However,in some studies on k-anonymity algorithms,the weight settings of the data utility evaluation model used by the algorithm are unreasonable,which leads to the low classification accuracy of the optimal anonymous data set selected by the algorithm.Mutual Information(MI)reflects the relationship between variables.The MI Loss evaluation model uses the mutual information between the quasiidentifier and the label to calculate the weight.The information loss of each quasi-identifier is obtained through the Loss formula,and the sum of all weighted quasi-identifier information losses is taken as the information loss of the data set,which makes up for the shortcomings of the existing evaluation model.Experiments show that using the MI Loss evaluation model to guide the k-anonymity algorithm can significantly reduce the utility loss of anonymous data sets in subsequent classification problems.The classification accuracy of the proposed model shows an improvement of 0.73%~3.00%compared with the accuracies of the Loss and Entropy Loss models.
关 键 词:隐私保护数据挖掘 k-匿名算法 数据可用性 分类准确率 MI Loss评估模型
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222