基于迭代二分聚类的K-匿名机制  

K-anonymity Mechanism Based on Iterative Binary Clustering

在线阅读下载全文

作  者:王涛 谭虎 徐亭亭 辛保江 刘刚 周潘[2] Wang Tao;Tan Hu;Xu Tingting;Xin Baojiang;Liu Gang;Zhou Pan(State Grid Shandong Electric Power Company Weifang Power Supply Company,Weifang,Shandong 261021;School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074)

机构地区:[1]国网山东省电力公司潍坊供电公司,山东潍坊261021 [2]华中科技大学网络空间安全学院,武汉430074

出  处:《信息安全研究》2023年第5期402-411,共10页Journal of Information Security Research

基  金:国网山东省电力公司科技项目(2020A-027)。

摘  要:随着数据共享在各个领域的深入应用,对于数据所包含的个体隐私保护问题日益突出,同时K-匿名作为一种隐私保护的先进理论也被广泛应用于数据的共享与分发.但是K-匿名作为一种通过概化数据实现隐私保护的方式,不可避免地会造成一定的信息损失,因此如何在满足K-匿名的前提下,尽可能保证数据可用性、减少信息损失量则是一个值得研究的问题.对于此,针对数值型数据提出了一种基于迭代二分聚类的K-匿名算法KABIBC(K-anonymous algorithm based on iterative binary clustering)实现K-匿名.首先定义了组内距离之和WGSD(within-group sum of distance),并将数据表中的所有元组视为一个聚类,而后采用迭代的策略对其进行二分聚类,对于得到的子聚类采用同样的方式递归进行处理,并且在二分聚类时基于最小化信息损失量的原则合理调整2个子聚类的元组分配,直到得到满足K-匿名要求的最小子聚类,从而保证信息损失量趋于最优.给出了理论和实验分析,表明此机制有效减少了信息损失,同时有较高的运行效率.With the deepening of data sharing in various fields,the protection of individual privacy contained in data has become increasingly prominent.At the same time,K-anonymity,as an advanced theory of privacy protection,is also widely used in data sharing and distribution.However,K-anonymity,as a way to achieve privacy protection by generalizing data,will inevitably cause a certain loss of information.Therefore,how to ensure data availability and reduce the information loss as much as possible under the premise of satisfying K-anonymity is a question worthy of study.For this problem,for numerical data,a K-anonymity algorithm KABIBC(K-anonymous algorithm based on iterative binary clustering)based on iterative binary clustering is proposed to achieve K-anonymity.First,the sum of the distances within the group is defined,i.e.,WGSD(within-group sum of distance),and treat all tuples in the data table as a cluster,and then use an iterative strategy to perform binary clustering on it,and recursively process the obtained sub-clusters in the same way,and reasonably adjust the tuple assignment of the two subclusters based on the principle of minimizing the information loss in the bisection,until the minimum subcluster that satisfies the K-anonymity requirement is obtained,so as to ensure that the amount of information loss tends to be optimal.Theoretical and experimental analysis are given,and it is shown that this mechanism can effectively reduce the information loss,and at the same time has a high operating efficiency.

关 键 词:迭代优化 二分聚类 隐私保护 K-匿名 概化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象