检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐捷 杨庚[1,2] 白云璐 XU Jie;YANG Geng;BAI Yun-lu(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210046,China;Jiangsu Province Key Laboratory of Big Data Security and Intelligent Processing,Nanjing 210023,China;School of Information Technology,Nanjing University of Traditional Chinese Medicine,Nanjing 210003,China)
机构地区:[1]南京邮电大学计算机学院,江苏南京210046 [2]江苏省大数据安全与智能处理重点实验室,江苏南京210023 [3]南京中医药大学信息技术学院,江苏南京210003
出 处:《计算机技术与发展》2022年第5期80-86,共7页Computer Technology and Development
基 金:国家自然科学基金资助项目(61872197,61972209)。
摘 要:频繁子图挖掘是频繁模式挖掘的一种具体形式,广泛应用于社会网络分析、生物技术、推荐系统等领域。然而,图数据集中可能包含一些敏感的信息,在挖掘过程中或发布频繁子图信息时都可能造成隐私的泄露。对此,提出一种面向差分隐私保护的top-k子图挖掘算法——DP-TGM(Differential Private Top-k subGraph Mining)。算法首先依据挖掘出的频繁点和边对数据集剪枝,然后将频繁的边依次进行扩展挖掘,得到最终的top-k频繁子图。该算法使用一个优先权队列存储临时挖掘到的前k个最频繁的子图,在扩展挖掘的过程中不断更新队列里的元素,并将阈值始终更新为队列里的最小噪音支持度,减少图的扩展次数。算法使用拉普拉斯机制在三个不同的阶段对子图的真实支持度添加噪音,并且采用均分法和特殊级数法对隐私预算进行合理的分配以提高数据可用性。文章用理论证明算法满足ε-差分隐私保护,且在不同规模的数据集上验证了算法的可用性。Frequent subgraph mining is a specific form of frequent pattern mining, which is widely used in social network analysis, biotechnology, recommendation system and other fields. However, graph datasets may contain some sensitive information, which may lead to privacy leakage in the process of mining or publishing frequent subgraph information. To solve this problem, a DP-TGM(Differential Private Top-k subGraph Mining) is proposed. Firstly, the algorithm prunes the dataset according to the frequent vertices and edges, and then extends the frequent edges to get the final top-k frequent subgraph. The algorithm uses a priority queue to store the first k most frequent subgraphs which are temporarily mined. In the process of expanding mining, the elements in the queue are constantly updated, and the threshold is always updated to the minimum noise support in the queue, so as to reduce the number of graph expansion. The algorithm uses Laplacian mechanism to add noise to the true support of subgraphs in three different stages, and uses the averaging method and the special progression method to allocate the privacy budget reasonably to improve the data availability. It is proved theoretically that the algorithm satisfies the differential privacy protection, and the usability of the algorithm is verified on different data sets.
关 键 词:top-k频繁子图 差分隐私 拉普拉斯机制 隐私预算 数据可用性
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38