基于互联网大数据的脱敏分析技术研究  被引量:15

Data Masking Analysis Based on Internet Big Data

在线阅读下载全文

作  者:周倩伊 王亚民[1] 王闯 Zhou Qianyi ,Wang Yamin ,Wang Chuang(School of Economics and Management, Xidian University, Xi'an 710126, Chin)

机构地区:[1]西安电子科技大学经济与管理学院,西安710126

出  处:《数据分析与知识发现》2018年第2期58-63,共6页Data Analysis and Knowledge Discovery

摘  要:【目的】基于现有的脱敏技术,改进匿名组的划分效果,得到较优的脱敏模型及算法。【方法】基于k-匿名技术,改进维度划分标准,以KD树作为存储结构,构造新算法。利用Python实现程序,比较所产生的匿名组数量、NCP百分比,验证算法的可行性与有效性。【结果】新算法能够使得脱敏后整个数据集所生成的匿名组个数达到最大。且NCP百分比低于同类算法。【局限】对于有某一属性离散程度显著的数据集,循环计算划分维度较为繁琐。【结论】新算法相比于传统算法增加了匿名组个数,相比于同类算法,信息损失较低。[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.

关 键 词:数据脱敏 K-匿名模型 取整划分 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] G35[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象