检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王修君 莫磊[3] 郑啸 卫琳娜 董俊[4] 刘志 郭龙坤[3] Wang Xiujun;Mo Lei;Zheng Xiao;Wei Linna;Dong Jun;Liu Zhi;Guo Longkun(School of Computer Science and Technology,Anhui University of Technology,Ma’anshan,Anhui 243032;Anhui Engineering Research Center for Intelligent Applications and Security of Industrial Internet(Anhui University of Technology),Ma’anshan,Anhui 243032;School of Mathematics,Fuzhou University,Fuzhou 350108;Institute of Intelligent Machines,Hefei Institute of Physical Science,Chinese Academy of Sciences,Hefei 230031;The University of Electro-Communications,Tokyo,Japan 163-8001)
机构地区:[1]安徽工业大学计算机科学与技术学院,安徽马鞍山243032 [2]安徽省工业互联网智能应用与安全工程研究中心(安徽工业大学),安徽马鞍山243032 [3]福州大学数学与统计学院,福州350108 [4]中国科学院合肥物质科学研究院智能机械研究所,合肥230031 [5]电气通信大学,日本东京163-8001
出 处:《计算机研究与发展》2024年第10期2433-2447,共15页Journal of Computer Research and Development
基 金:国家自然科学基金项目(62172003,12271098,61772005);安徽省自然科学基金项目(2108085MF218,2108085MF217);安徽省高校自然科学研究项目(2022AH040052);马鞍山市科技创新项目(2021a120009)。
摘 要:基于云原生数据库的许多应用场景需要处理海量的数据流.为了实时分析数据流中的群体趋势信息而又不泄露单个用户的隐私,这些应用需要在每个时刻都可以为数据流中的最近数据集快速创建可以安全发布的差分隐私直方图.然而,现有的直方图发布方法因缺乏高效数据结构,导致无法快速提取关键信息以确保数据的实时可用性.为解决此问题,深入分析数据采样与隐私保护之间的关系,提出基于采样的数据流差分隐私快速发布算法SPF(sampling based fast publishing algorithm with differential privacy for data stream).SPF首创高效数据流采样草图结构(efficient data stream sampling sketch structure,EDS),EDS对滑动窗口内数据进行采样统计估计,并过滤不合理数据,实现了对关键信息的快速提取.然后,证明EDS结构输出的近似值理论上等效于对真实值添加差分隐私噪声.最后,为了满足用户所提供的隐私保护强度,并且避免正确反映原始数据流的真实情况,提出了一种基于高效数据流采样的自适应加噪算法.根据用户的隐私保护强度和EDS结构所提供的隐私保护强度之间的关系,通过隐私分配的方式自适应生成最终可发布直方图.实验证明,相较于现有算法,SPF在保持相同数据可用性的前提下显著降低了时间和空间开销.Many cloud native database applications need to handle massive data streams.To analyze group trend information in these data streams in real time without compromising individual user privacy,these applications require the capability to quickly create differentially private histograms for the most recent dataset at any given moment.However,existing histogram publishing methods lack efficient data structures,making it difficult to rapidly extract key information to ensure real-time data usability.To address this issue,we deeply analyze the relationship between data sampling and privacy protection,and propose a sampling based fast publishing algorithm with differential privacy for data stream(SPF).SPF introduces an efficient data stream sampling sketch structure(EDS)for the first time,which samples and statistically estimates data within a sliding window and filters out unreasonable data,enabling rapid extraction of key information.Then,we demonstrate that the approximations output by the EDS structure are theoretically equivalent to adding differential privacy noise to the true values.Finally,to meet the privacy protection strength provided by the user while reflecting the true situation of the original data stream,an adaptive noise addition algorithm based on efficient data stream sampling is proposed.According to the relationship between the userprovided privacy protection strength and the privacy protection strength provided by the EDS structure,the algorithm adaptively generates the final publishable histogram through privacy allocation.Experiments show that compared with existing algorithms,SPF significantly reduces time and space overhead while maintaining the same data usability.
关 键 词:云原生数据库 滑动窗口 数据流 差分隐私 数据采样 数据发布
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.68