满足本地差分隐私的混合噪音感知的模糊C均值聚类算法  

Fuzzy C-Means Clustering Algorithm Based on Mixed Noise-aware under Local Differential Privacy

在线阅读下载全文

作  者:张朋飞 程俊 张治坤 方贤进[1] 孙笠 王杰[5] 姜茸 ZHANG Pengfei;CHENG Jun;ZHANG Zhikun;FANG Xianjin;SUN Li;WANG Jie;JIANG Rong(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China;Key Laboratory of Service Computing,Yunnan University of Finance and Economics,Kunming 650221,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310058,China;School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China;School of Safety Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,淮南232001 [2]云南省服务计算重点实验室(云南财经大学),昆明650221 [3]浙江大学计算机科学与技术学院,杭州310058 [4]华北电力大学控制与计算机工程学院,北京102206 [5]安徽理工大学安全科学与工程学院,淮南232001

出  处:《电子与信息学报》2025年第3期739-757,共19页Journal of Electronics & Information Technology

基  金:安徽理工大学高层次引进人才科研启动基金(2023yjrc92);云南省服务计算重点实验室开放课题(YNSC24116);国家自然科学基金(62202164).

摘  要:在大数据和物联网应用中,本地差分隐私(LDP)技术用于保护聚类分析中的用户隐私,但现有方法要么在LDP下交互式地进行聚类,需要消耗大量隐私预算,要么没有同时考虑到聚类数据中蕴含的表示数据质量的高斯噪音以及为满足LDP保护的拉普拉斯噪音,致使聚类精度低下。同时,对于衡量用户提交数据和簇心之间的距离选择较为武断,没有充分利用到用户提交的噪音数据中蕴含的噪音模式。为此,该文创新性地提出一种满足LDP的混合噪音感知的模糊C均值聚类算法(mnFCM),该算法的主要思想是同时建模用户上传数据中蕴含的表示用户质量的高斯噪音以及为保护用户数据注入的拉普拉斯噪音,进而设计出混合噪音感知的距离替代传统的欧式距离,来衡量样本数据与簇心间的相似性。特别地,在mnFCM中,该文首先设计了混合噪音感知的距离计算方法,在此基础上给出算法新的目标函数,并基于拉格朗日乘子法设计了求解方法,最后理论上分析了求解算法的收敛性。该文进一步理论分析了mnFCM的隐私、效用和复杂度,分析结果表明所提算法严格满足LDP、相对于对比算法更接近非隐私下的簇心以及和非隐私算法具有接近的复杂度。在两个真实数据集上的实验结果表明,mnFCM在满足LDP下,聚类精度提高了10%~15%。Objective In big data and Internet of Things(IoT)applications,clustering analysis of collected data is crucial for enhancing user experience.To mitigate privacy risks from using raw data directly,Local Differential Privacy(LDP)techniques are often employed.However,existing LDP clustering studies either require interactive execution,consuming significant privacy budgets,or fail to balance Gaussian noise in clustering data with Laplacian noise for LDP protection,resulting in low clustering accuracy.Moreover,distance metrics for similarity measurement are chosen arbitrarily without fully utilizing the noise characteristics of user-submitted noisy data.This study designs a hybrid noise-aware distance calculation method integrated into the fuzzy C-means clustering algorithm,effectively reducing noise impact on clustering results while protecting data privacy,ensuring both privacy security and clustering quality.It provides a robust solution for sensitive information processing in high-dimensional data environments.Methods This paper innovatively proposes a mixed noise-aware Fuzzy C-Means clustering algorithm(mnFCM)under LDP.The core idea is to model both Gaussian noise(representing data quality)and Laplacian noise(for data protection)in uploaded user data by constructing a more accurate mixed distribution model,and design a mixed noise-aware distance to replace Euclidean distance for measuring similarity between samples and cluster centers.Specifically,in mnFCM,this paper first designs a mixed noise-aware distance calculation method.On this basis,a new objective function for the algorithm is proposed,and a solution method is designed based on the Lagrange multiplier method.Finally,the convergence of the solution algorithm is theoretically analyzed.Results and Discussions The experimental results show that as the privacy budgetεincreases,the performance of various clustering algorithms generally improves.Notably,mnFCM achieves at least a 8.5%improvement in accuracy compared to the state-of-the-art PrivPro algorithm(Fi

关 键 词:聚类分析 隐私保护 本地差分隐私 模糊C均值聚类 拉普拉斯机制 

分 类 号:TN911[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象