检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于玥 林宪正 李卫海[1,2] 俞能海 YU Yue;LIN Xianzheng;LI Weihai;YU Nenghai(School of Cyber Technology and Science,University of Science and Technology of China,Hefei 230001,China;Key Laboratory of Electro-magnetic Space Information,Chinese Academy of Sciences,Hefei 230001,China;2012 Labs,Huawei Technology Co.Ltd,Hong Kong 999077,China)
机构地区:[1]中国科学技术大学网络空间安全学院,安徽合肥230001 [2]中国科学院电磁空间信息重点实验室,安徽合肥230001 [3]华为技术有限公司2012实验室,中国香港999077
出 处:《网络与信息安全学报》2023年第4期64-73,共10页Chinese Journal of Network and Information Security
基 金:国家自然科学基金(62071446)。
摘 要:k-匿名模型作为常用的数据匿名技术,广泛应用于数据发布阶段的隐私保护。随着大数据时代的快速发展,海量数据的产生给数据存储带来了新的挑战。然而,存储器的成本较高且存储空间有限,通过硬件升级来无限制地扩充存储空间并不可行。为此,使用数据压缩技术可以减少存储成本和通信开销。为减少数据发布阶段使用匿名技术产生的数据的存储空间,提出了k-匿名模型隐私保护数据压缩方案。对于k-匿名模型的原始数据,按照设定的规则及原始数据同匿名数据之间的预设泛化层次关系计算两者的差值,并根据差值数据具有的频率特性对差值进行哈夫曼编码压缩。通过存储差值可以间接获得原始数据,从而减少原始数据的存储空间。对于k-匿名模型的匿名数据,根据模型的泛化规则或预设泛化层次关系,匿名数据通常具有较高的重复性,且设定的k值越大,匿名数据的泛化程度越高、重复性越强。对匿名数据设计实现哈夫曼编码压缩,减少匿名数据的存储空间。实验结果表明,所提方案能够显著降低k-匿名模型的原始数据及匿名数据的压缩率。在使用的5个k-匿名模型及不同k值的设定情况下,与Windows 11的zip工具相比,所提方案的原始数据压缩率和匿名数据压缩率平均降低了72.2%、64.2%。The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However,with the advent of the big data era,the generation of vast amounts of data poses challenges to data storage.However,it is not feasible to expand the storage space infinitely by hardware upgrade,since the cost of memory is high and the storage space is limited.For this reason,data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase,a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model,the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data,the original data can be obtained indirectly,thus reducing the storage space of the original data.For anonymized data of the k-anonymity model,the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k,the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and various k-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2%and 64.2%on average compared to the Windows 11 zip tool.
关 键 词:K-匿名模型 隐私保护 数据压缩存储 哈夫曼编码
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.1.165