基于哈夫曼的k-匿名模型隐私保护数据压缩方案  被引量:2

Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding

在线阅读下载全文

作  者:于玥 林宪正 李卫海[1,2] 俞能海 YU Yue;LIN Xianzheng;LI Weihai;YU Nenghai(School of Cyber Technology and Science,University of Science and Technology of China,Hefei 230001,China;Key Laboratory of Electro-magnetic Space Information,Chinese Academy of Sciences,Hefei 230001,China;2012 Labs,Huawei Technology Co.Ltd,Hong Kong 999077,China)

机构地区:[1]中国科学技术大学网络空间安全学院,安徽合肥230001 [2]中国科学院电磁空间信息重点实验室,安徽合肥230001 [3]华为技术有限公司2012实验室,中国香港999077

出  处:《网络与信息安全学报》2023年第4期64-73,共10页Chinese Journal of Network and Information Security

基  金:国家自然科学基金(62071446)。

摘  要:k-匿名模型作为常用的数据匿名技术,广泛应用于数据发布阶段的隐私保护。随着大数据时代的快速发展,海量数据的产生给数据存储带来了新的挑战。然而,存储器的成本较高且存储空间有限,通过硬件升级来无限制地扩充存储空间并不可行。为此,使用数据压缩技术可以减少存储成本和通信开销。为减少数据发布阶段使用匿名技术产生的数据的存储空间,提出了k-匿名模型隐私保护数据压缩方案。对于k-匿名模型的原始数据,按照设定的规则及原始数据同匿名数据之间的预设泛化层次关系计算两者的差值,并根据差值数据具有的频率特性对差值进行哈夫曼编码压缩。通过存储差值可以间接获得原始数据,从而减少原始数据的存储空间。对于k-匿名模型的匿名数据,根据模型的泛化规则或预设泛化层次关系,匿名数据通常具有较高的重复性,且设定的k值越大,匿名数据的泛化程度越高、重复性越强。对匿名数据设计实现哈夫曼编码压缩,减少匿名数据的存储空间。实验结果表明,所提方案能够显著降低k-匿名模型的原始数据及匿名数据的压缩率。在使用的5个k-匿名模型及不同k值的设定情况下,与Windows 11的zip工具相比,所提方案的原始数据压缩率和匿名数据压缩率平均降低了72.2%、64.2%。The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However,with the advent of the big data era,the generation of vast amounts of data poses challenges to data storage.However,it is not feasible to expand the storage space infinitely by hardware upgrade,since the cost of memory is high and the storage space is limited.For this reason,data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase,a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model,the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data,the original data can be obtained indirectly,thus reducing the storage space of the original data.For anonymized data of the k-anonymity model,the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k,the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and various k-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2%and 64.2%on average compared to the Windows 11 zip tool.

关 键 词:K-匿名模型 隐私保护 数据压缩存储 哈夫曼编码 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象