最优聚类的k-匿名数据隐私保护机制被引量：10

k-Anonymous Data Privacy Protection Mechanism Based on Optimal Clustering

作　　者：张强叶阿勇[1,2] 叶帼华[1,2] 邓慧娜陈爱民 Zhang Qiang;Ye Ayong;Ye Guohua;Deng Huina;Chen Aimin(College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117;Fujian Provincial Key Laboratory of Network Security and Cryptology(Fujian Normal University),Fuzhou 350117)

机构地区：[1]福建师范大学计算机与网络空间安全学院,福州350117 [2]福建省网络安全与密码技术重点实验室(福建师范大学),福州350117

出　　处：《计算机研究与发展》2022年第7期1625-1635,共11页Journal of Computer Research and Development

基　　金：国家自然科学基金项目(61972096,61771140,61872088,61872090);福建省高校产学合作项目(2022H6025)。

摘　　要：基于聚类的k-匿名机制是共享数据脱敏的主要方法,它能有效防范针对隐私信息的背景攻击和链接攻击。然而,现有方案都是通过寻找最优k-等价集来平衡隐私性与可用性.从全局看,k-等价集并不一定是满足k-匿名的最优等价集,隐私机制的可用性最优化问题仍然未得到解决.针对上述问题,提出一种基于最优聚类的k-匿名隐私保护机制.通过建立数据距离与信息损失间的函数关系,将k-匿名机制的最优化问题转化为数据集的最优聚类问题;然后利用贪婪算法和二分机制,寻找满足k-匿名约束条件的最优聚类,从而实现k-匿名模型的可用性最优化;最后给出了问题求解的理论证明和实验分析.实验结果表明该机制能最大程度减少聚类匿名的信息损失,并且在运行时间方面是可行有效的.The emerging technologies about big data enable many organizations to collect massive amount information about individuals. Sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier. k-anonymity based on clustering is the most important technique to prevent privacy disclosure in data-sharing, which can overcome the threat of background based attacks and link attacks. Existing anonymity methods achieve the balance with privacy and utility requirements by seeking the optimal k-equivalence set. However, viewing the results as a whole, k-equivalent set is not necessarily the optimal solution satisfying k-anonymity so that the utility optimality is not guaranteed. In this paper, we endeavor to solve this problem by using optimal clustering approach. We follow this idea and propose a greedy clustering-anonymity method by combining the greedy algorithm and dichotomy clustering algorithm. In addition, we formulate the optimal data release problem that minimizes information loss given a privacy constraint. We also establish the functional relationship between data distance and information loss to capture the privacy/accuracy trade-off process in an online way. Finally, we evaluate the mechanism through theoretic analysis and experiments verification. Evaluations using real datasets show that the proposed method can minimize the information loss and be effective in terms of running time.

关键词：隐私保护 K-匿名聚类优化信息损失数据发布

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

最优聚类的k-匿名数据隐私保护机制被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

最优聚类的k-匿名数据隐私保护机制 被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

最优聚类的k-匿名数据隐私保护机制被引量：10