一种面向聚类的隐私保护数据发布方法  被引量:14

A Privacy-Preserving Data Publishing Algorithm for Clustering Application

在线阅读下载全文

作  者:崇志宏[1] 倪巍伟[1] 刘腾腾[1] 张勇[1] 

机构地区:[1]东南大学计算机科学与工程学院,南京210096

出  处:《计算机研究与发展》2010年第12期2083-2089,共7页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61003057;60973023);江苏省自然科学基金项目(BK2006095)

摘  要:隐私保护微数据发布技术可以在保护敏感数据隐私的同时,维持数据的可用性.但已有的多数发布方法都局限于类别属性数据集,发布后数据可用性以维持数据聚集查询可用性和频繁项集分析、分类挖掘可用性为主.针对数据挖掘领域另一重要任务——聚类分析,以及聚类分析中常处理的数值属性数据隐藏发布问题,提出隐藏算法NeSDO,算法对数据记录关于聚类可用性的特征进行分析,引入个性数据记录和共性数据记录的定义.采用合成数据替换扰动方法,为个性数据记录定义相应的正邻域记录集和负邻域记录集.对共性数据记录用其k最近邻域数据记录的均值替换;对个性数据记录分别采用其正邻域记录集或负邻域记录集内记录的均值进行置换,实现隐藏处理.理论分析和实验结果表明,算法NeSDO能够较好地保护敏感数值不泄露,同时能够有效保持发布后数据的聚类可用性.Privacy has become a more and more serious concern in applications involving micro-data. Recently, privacy-preserving data publishing has attracted much research work. Most of the present methods focus on categorical data publishing, and the potential applications are mainly for aggregate querying, frequent pattern mining and classification. Concerning the problem of publishing numerical data for clustering analysis, definitions of individual data record and common data record are introduced by making density analysis within the neighborhood of a given record, which can describe the effect of each data record on maintaining clustering usability. Furthermore, positive neighborhood and negative neighborhood are designed for individual data record respectively. Based on the above definitions, a data obfuscating method NeSDO is proposed, which realizes privacy-preserving data publishing by substituting primitive micro-data values with synthetic statistical values of some suitable data subset. For an individual data record, average value of records in its negative neighborhood(or positive neighborhood) is adopted to substitute corresponding items of this record. For a common data record, average value of records in its k nearest neighborhood is adopted vice versa. Theoretical analysis and experimental results indicate that the algorithm NeSDO is effective and can preserve privacy of the sensitive data well meanwhile maintaining better clustering usability.

关 键 词:隐私保护数据发布 聚类 k邻域 个性数据记录 共性数据记录 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象