一种面向数据流top-k频繁模式发布的差分隐私保护方案  被引量:7

A Differentially Private Scheme for Top-k Frequent Itemsets Mining Over Data Streams

在线阅读下载全文

作  者:梁文娟 陈红[1,2] 赵素云[1,2] 李翠平[1,2] LIANG Wen Juan;CHEN Hong;ZHAO Su-Yun;LI Cui-Ping(Key Laboratory of Dala Engineering and Knouledge Engineering of Ministry of Education,Renmin University of China,Beijing 100872;School of Informalion,Renmin University of China,Beijing 100872;College of Compuler and Informalion Engineering,Henan Unizersity,Kaifeng,Henan 475001)

机构地区:[1]中国人民大学数据工程与知识工程教育部重点实验室,北京100872 [2]中国人民大学信息学院,北京100872 [3]河南大学计算机与信息工程学院,河南开封475001

出  处:《计算机学报》2021年第4期741-760,共20页Chinese Journal of Computers

基  金:国家自然科学基金(61532021,61772537,61772536,61702522)资助.

摘  要:频繁模式挖掘是事务数据分析的常用技术,面向数据流的频繁模式挖掘具有重要的应用价值.然而当事务为敏感信息时,直接发布频繁模式及支持度会导致个体隐私泄露.差分隐私是一种严格且可证明的隐私保护模型,目前虽然已有基于差分隐私的频繁模式发布方案,但它们大都是面向静态数据做一次性发布的隐私保护.本文是面向数据流频繁模式发布的隐私保护,旨在设计一种兼顾可用性和发布效率的持续发布的差分隐私保护方案.与静态发布方案不同,面向数据流的隐私保护处理面临两大挑战:一是持续发布过程中隐私预算的累计消耗会造成发布结果可用性较低;二是候选模式集增大会造成发布结果误差较大和发布效率较低.为解决隐私预算的累计消耗问题,方案设计了满足event级差分隐私的保护机制.该机制可以最大化隐私预算利用率,提高发布结果可用性.为降低候选模式集大小,从而提高发布结果可用性和发布效率,方案首先设计了一种基于模式估计的长事务拆分预处理策略,并对拆分所致的信息丢失率进行了分析和弥补.然后在持续发布阶段,在基于Cantree的挖掘中,先基于支持度阈值对候选模式集进一步缩减.基于缩减后的候选模式集,本文设计了一种蓄水池抽样和指数机制(EM)相结合的持续更新发布策略,该策略通过一遍扫描抽样集,在保证可用性和隐私保护级别的前提下提高了发布效率.最后,理论证明了该方案满足ε-差分隐私,实验结果验证了该方案具有较好的可用性和较高的工作效率.Frequent Itemset Mining(FIM) is a popular technique for analyzing transactional data.Currently,continually mining frequent patterns over data streams have important application value.However,if these data are sensitive,directly releasing frequent patterns and their true supports may disclose the privacy of individuals.The protection of user privacy while obtaining statistical information is important.Differential privacy,as a rigid and provable privacy model,has recently gained significant attention in frequent itemset mining.Several differentially private schemes for FIM have been proposed.However,most of them are designed for static datasets and focus on privacy preservation for one-shot release of frequent patterns.In this paper,we focus on differentially private FIM over data streams,and aim to design a private continual release scheme which can not only achieve high data utility and a high degree of privacy,but also offer high time efficiency.Compared to the static release,the privacy preservation processing in FIM over data streams should meet the real-time requirement while achieving high data utility.Therefore,two challenges will be faced when designing the private scheme for FIM over data streams:the first is the cumulative consumption of privacy budget in the continual release process,which may result in low data utility.The second is the enlargement of candidate itemsets,which may cause a high error of the released result and low release efficiency.Some strategies need to be designed to solve the two challenges to promote the data utility and release efficiency.To solve the first challenge,a strategy of event level privacy for FIM over data streams is designed.Benefitting from this strategy,the utilization of the privacy budget for the continual release can be maximized,and thus the data utility can be improved.To solve the second challenge,an efficient transaction splitting method based on count estimation under differential privacy is first proposed in the preprocessing stage.Since splitting may caus

关 键 词:模式估计 差分隐私 蓄水池抽样 频繁模式挖掘 事务拆分 

分 类 号:TP392[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象