检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨舒丹 李男[1,2] 郑文娟 杜启明 YANG Shudan;LI Nan;ZHENG WenJuan;DU Qiming(Department of Cyberspace Security Academy,Strategic Support Force Information Engineering University,Zhengzhou 450000,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450000,China;People’s Liberation Army 32142 Unit,Baoding 071000,China)
机构地区:[1]战略支援部队信息工程大学网络空间安全学院,郑州450000 [2]数学工程与先进计算国家重点实验室,郑州450000 [3]32142部队保定,071000
出 处:《信息安全学报》2023年第4期113-125,共13页Journal of Cyber Security
基 金:国家自然科学基金资助项目(No.62472447)资助。
摘 要:利用K-means算法对用户信息进行聚类时,存在隐私泄露的风险。差分隐私保护技术可提供严格的隐私保护,但目前大多数满足差分隐私的K-means算法在处理多维数据时,存在随机选择质心和噪声添加不均衡的问题,因而导致聚类结果不理想。为此,本文提出一种基于Tsallis熵的近似差分隐私K-means算法。针对质心选择的随机性问题,提出Tsallis熵对属性赋权的策略来优化对象间的欧氏距离,然后对比各对象到唯一随机初始质心的赋权欧式距离来确定其余初始质心,使算法在减少随机选择初始质心的同时,提高模型准确率;在此基础上,针对噪声添加不均衡的问题,提出一种能够平衡信噪比的隐私预算分配策略,然后对迭代质心加入高斯扰动,使算法在不增加计算复杂度的情况下满足(ε,δ)-差分隐私保护,同时提升扰动结果的准确性;最后在四个真实数据集上对算法进行有效性评价。实验结果表明,所提出的算法能够在保证用户隐私安全的同时实现高效用的聚类。When using the K-means algorithm to cluster user information,there is a risk of privacy leakage.Differential privacy protection technology can provide strict privacy protection.However,most of the current K-means algorithms that meet differential privacy have problems of random selection of centroids and imbalanced noise addition when processing multi-dimensional data,which leads to unsatisfactory clustering results.To this end,this paper proposes an approximate differential privacy K-means algorithm based on Tsallis entropy.Aiming at the randomness problem of centroid selection,a strategy of attribute weighting by Tsallis entropy is proposed to optimize the Euclidean distance between objects,and then the weighted Euclidean distance between each object and the unique random initial centroid is compared to determine the remaining initial centroids.This allows the algorithm to reduce the random selection of the initial centroid while improving the accuracy of the model;on this basis,a privacy budget allocation strategy that can balance the signal-to-noise ratio is proposed for the problem of noise imbalance,and then add Gaussian perturbation to the iterative centroid,so that the algorithm satisfies(e,d)-differential privacy protection without increasing the time complexity,and at the same time improves the accuracy of the perturbation results;finally,the effectiveness of the algorithm is evaluated on four real data sets.Experimental results show that the proposed algorithm can achieve efficient clustering while ensuring user privacy.
关 键 词:近似差分隐私 高斯机制 TSALLIS熵 K-MEANS聚类 数据挖掘
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7