基于机器学习的大数据隐私非交互式查询研究  

Research on Non Interactive Query of Big Data Privacy Based on Machine Learning

在线阅读下载全文

作  者:李静 赵青杉[1] 高媛[2] LI Jing;ZHAO Qing-shan;GAO Yuan(Computer Department,Xinzhou Teachers University,XinzhouShanxi 034000,China;College of Big Data,North University of China,Taiyuan Shanxi 030051,China)

机构地区:[1]忻州师范学院计算机系,山西忻州034000 [2]中北大学大数据学院,山西太原030051

出  处:《计算机仿真》2023年第8期334-338,共5页Computer Simulation

基  金:山西省本科教学质量提升工程项目(教学改革创新项目)(J2020291);忻州师范学院大学生创新创业训练计划项目(201924)。

摘  要:大数据环境下,数据大量汇聚,易造成严重的隐私泄露问题。由于大数据包的关联程度是不确定的,导致大数据隐私查询难度较大。为此,提出基于机器学习的大数据隐私非交互式查询方法。采用关联规则挖掘算法挖掘初始大数据集中数据包的关联程度,并与预先设定的阈值对比,选取关联程度低于阈值的数据,构建大数据特征集。选取K-means聚类算法划分特征集获取查询集,在查询集内加入拉普拉斯噪声,获取符合差分隐私标准的查询集。构建训练样本集,采用线性回归算法训练样本集,得到符合差分隐私保护查询结果。实验结果显示:所研究方法能够显著降低大数据隐私非交互查询过程中的数据计算压力,节约大量时间与空间。In a big data environment,massive data aggregation may cause serious privacy disclosure.Due to the uncertainty between big data packets,it is difficult to query big data privacy.Therefore,a non-interactive query method for big data privacy based on machine learning was presented in the paper.First of all,we used the association rule mining algorithm to mine the association degree of data packets in initial big data sets.After comparing with the pre-set threshold,we chose the data whose association degree was lower than the threshold to construct the feature set of big data.Moreover,we used K-means clustering algorithm to divide the feature set and thus to obtain the query set.After Laplace noise was added to the query set,we obtained the query set that met the differential privacy standard.Furthermore,we constructed a training sample set,and used the linear regression algorithm to train the sample set,thus obtaining query results that conform to the differential privacy protection.Experimental results show that the proposed method can significantly reduce the data calculation pressure during the non-interactive query of big data privacy,and saves a lot of time and space.

关 键 词:机器学习 大数据隐私保护 非交互式查询 拉普拉斯噪声 线性回归算法 

分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象