基于贝叶斯网络的差分隐私高维数据发布技术研究

Research on Differential Privacy High Dimensional Data Publishing Technology Based on Bayesian Networks

作　　者：卢晓天朴春慧杨兴雨白英杰 LU Xiaotian;PIAO Chunhui;YANG Xingyu;BAI Yingjie(School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,Hebei,China;Hebei Key Laboratory for Electromagnetic Environmental Effects and Information Processing,Shijiazhuang 050043,Hebei,China;CRSC Research&Design Institute Group Co.,Ltd.,Beijing 100070,China)

机构地区：[1]石家庄铁道大学信息科学与技术学院,河北石家庄050043 [2]河北省电磁环境效应与信息处理重点实验室,河北石家庄050043 [3]北京全路通信信号研究设计院集团有限公司,北京100070

出　　处：《计算机工程》2024年第5期167-181,共15页Computer Engineering

基　　金：河北省重点研发计划(21355902D)。

摘　　要：在实现隐私保护的同时提高数据可用性是高维结构化数据发布研究中的挑战性问题,经典算法Priv Bayes针对该问题提供了一种解决方案。为进一步减少计算开销、提高数据可用性,提出基于贝叶斯网络的差分隐私数据发布算法ELPriv Bayes。分析贝叶斯网络结构学习阶段的理论计算开销,构建存储属性之间互信息的相关矩阵,避免结构学习算法迭代过程中互信息的冗余计算,降低了时间复杂度。基于平均互信息优化了节点进入贝叶斯网络的顺序,提高结构学习迭代过程中指数机制贡献的互信息期望值,进而提高生成数据集与原始数据集的统计近似度,并实证分析网络结构质量对首节点选择的低敏感性。在4个典型数据集上的实验结果表明,与经典算法Priv Bayes及其改进方案相比较,结构学习阶段的计算开销降低了97%~99%,基于指数机制捕获的互信息提高了14%~67%,生成数据集与原始数据集的平均变差距离降低了32%~40%,构建的支持向量机(SVM)分类器的准确率提高了4%~5%,并且当ε≤0.8时,采用ELPriv Bayes算法生成数据的可用性提升更为显著。Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,whenε≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.

关键词：数据发布贝叶斯网络差分隐私隐私保护相关矩阵平均互信息

分类号：TP319[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于贝叶斯网络的差分隐私高维数据发布技术研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于贝叶斯网络的差分隐私高维数据发布技术研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索