检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卢晓天 朴春慧 杨兴雨 白英杰 LU Xiaotian;PIAO Chunhui;YANG Xingyu;BAI Yingjie(School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,Hebei,China;Hebei Key Laboratory for Electromagnetic Environmental Effects and Information Processing,Shijiazhuang 050043,Hebei,China;CRSC Research&Design Institute Group Co.,Ltd.,Beijing 100070,China)
机构地区:[1]石家庄铁道大学信息科学与技术学院,河北石家庄050043 [2]河北省电磁环境效应与信息处理重点实验室,河北石家庄050043 [3]北京全路通信信号研究设计院集团有限公司,北京100070
出 处:《计算机工程》2024年第5期167-181,共15页Computer Engineering
基 金:河北省重点研发计划(21355902D)。
摘 要:在实现隐私保护的同时提高数据可用性是高维结构化数据发布研究中的挑战性问题,经典算法Priv Bayes针对该问题提供了一种解决方案。为进一步减少计算开销、提高数据可用性,提出基于贝叶斯网络的差分隐私数据发布算法ELPriv Bayes。分析贝叶斯网络结构学习阶段的理论计算开销,构建存储属性之间互信息的相关矩阵,避免结构学习算法迭代过程中互信息的冗余计算,降低了时间复杂度。基于平均互信息优化了节点进入贝叶斯网络的顺序,提高结构学习迭代过程中指数机制贡献的互信息期望值,进而提高生成数据集与原始数据集的统计近似度,并实证分析网络结构质量对首节点选择的低敏感性。在4个典型数据集上的实验结果表明,与经典算法Priv Bayes及其改进方案相比较,结构学习阶段的计算开销降低了97%~99%,基于指数机制捕获的互信息提高了14%~67%,生成数据集与原始数据集的平均变差距离降低了32%~40%,构建的支持向量机(SVM)分类器的准确率提高了4%~5%,并且当ε≤0.8时,采用ELPriv Bayes算法生成数据的可用性提升更为显著。Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,whenε≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.
关 键 词:数据发布 贝叶斯网络 差分隐私 隐私保护 相关矩阵 平均互信息
分 类 号:TP319[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.173