检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001
出 处:《计算机研究与发展》2014年第1期126-137,共12页Journal of Computer Research and Development
基 金:国家自然科学基金项目(61073041;61073043);教育部高等学校博士学科点专项科研基金项目(20112304110011;20122304110012);哈尔滨市科技创新人才研究专项资金项目(优秀学科带头人)(2011RFXXG015)
摘 要:t-closeness模型是数据发布领域中用于抵御相似性攻击和偏斜攻击的一种有效方法,但其采用的EMD(earth mover's distance)距离没有考虑等价类与数据表间敏感属性分布的稳定性,不能全面地衡量分布间距离,在分布间稳定差异过大时会大大提高隐私泄露的风险.针对这种局限,提出了一种SABuk t-closeness模型,它在传统t-closeness模型的基础上,为更加准确地度量分布间距离,以EMD距离与KL散度(kullback-leibler divergence)结合构建距离度量标准.同时,根据敏感属性的层次树结构,对数据表进行语义相似性桶分组划分,然后采用贪心思想生成满足要求的最小等价类,并且运用k-近邻的思想来选取QI(quasi-identifiers)值相似的元组生成等价类.实验结果表明,SABuk t-closeness模型在牺牲少量时间的前提下减少了信息损失,能在有效地保护敏感信息不泄露的同时保持较高的数据效用.The t-closeness model is an effective model to prevent the data sets from skewness attack and similarity attack. But the EMD (earth mover's distance), which t-closeness used to measure the distance between distributions, is not well considering the stability between distributions, so it is hardly to entirely measure the distance between distributions. When the stability between distributions is too large, it will greatly increase the risk of privacy. Aim to address these limitations and accurately measure the distance between distributions, based on traditional t-closeness, the model of SABuk t-closeness which combined the EMD with KL divergence to construct a new distance measurement is proposed. At the same time, according to the hierarchy of sensitive attribute (SA), it partitions a table into buckets based on the semantic similarity of SA values, and then uses greedy algorithm for generating the minimum groups which is satisfied with the requirement of the distance between distributions. At the end, it has adopted the k-nearest neighbour algorithm to choose similar quasi-identifiers (QI) values. Experimental results indicate that SABuk t-closeness model can bring down the information loss on the premise of consuming a little time, and it can preserve privacy of sensitive data well meanwhile maintaining high data utility.
关 键 词:隐私保护 桶分组 t-closeness模型 EMD KL散度
分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171