机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]南京航空航天大学高安全系统的软件开发与验证技术工业和信息化部重点实验室,南京211106 [3]软件新技术与产业化协同创新中心,南京210093
出 处:《计算机科学》2024年第2期322-332,共11页Computer Science
基 金:国家自然科学基金(U2241216,61772270);民航应急科学与技术重点实验室开放基金(NJ2022022)。
摘 要:从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集。然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性。基于此,提出了一种本地差分隐私下的高维数据发布方法PrivHDP(High-dimensional Data Publication Under Local Differential Privacy)。首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网。其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集。最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布。在4个真实数据集上进行实验,结果表明,PrivHDP算法在k-way查询和SVM分类精度方面优于同类算法,验证了所提方法的可用性与高效性。With the increasing availability of high-dimensional data collected from numerous users,preserving user privacy while utilizing high-dimensional data poses significant challenges.This paper focuses on the problem of high-dimensional data publication under local differential privacy.State-of-the-art solutions first construct probabilistic graphical models to generate a set of noisy low-dimensional marginal distributions of the input data,and then use them to approximate the joint distribution of the input dataset for generating synthetic datasets.However,existing methods have limitations in computing marginal distributions for a large number of attribute pairs to construct probabilistic graphical models,as well as in calculating joint distributions for attribute subsets within the probabilistic graphical models.To address these limitations,this paper proposes a method PrivHDP(high-dimensional data publication under local differential privacy)for high-dimensional data publication under local differential privacy.Firstly,it uses random sampling response instead of the traditional privacy budget splitting strategy to perturb user data.It proposes an adaptive marginal distribution computation method to compute the marginal distributions of pairwise attributes and construct a Markov network.Secondly,it employs a novel method to measure the correlation between pairwise attributes,replacing mutual information.This method introduces a threshold technique based on high-pass filtering to reduce the search space during the construction of the probabilistic graphical model.It combines sufficient triangulation operations and a joint tree algorithm to obtain a set of attribute subsets.Finally,based on joint distribution decomposition and redundancy elimination,the proposed method computes the joint distribution over attribute subsets.Experimental results on four real datasets demonstrate that the PrivHDP algorithm outperforms similar algorithms in terms of k-way query and SVM classification accuracy,validating its effectiveness
关 键 词:本地差分隐私 高维数据 数据发布 边缘分布 联合分布
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...