面向纵向联邦学习的隐私保护数据对齐框架

A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

作　　者：高莹谢雨欣邓煌昊朱祖坤张一余 GAO Ying;XIE Yuxin;DENG Huanghao;ZHU Zukun;ZHANG Yiyu(School of Cyber Science and Technology,Beihang University,Beijing 100191,China;Zhongguancun Laboratory,Beijing 100194,China)

机构地区：[1]北京航空航天大学网络空间安全学院,北京100191 [2]中关村实验室,北京100194

出　　处：《电子与信息学报》2024年第8期3419-3427,共9页Journal of Electronics & Information Technology

基　　金：北京市自然科学基金(M21033);腾讯微信犀牛鸟基金(本基金无项目编号)。

摘　　要：纵向联邦学习中,各个客户端持有的数据集中包含有重叠的样本ID和不同维度的样本特征,需要进行数据对齐以适应模型训练。现有数据对齐技术一般将各方样本ID交集作为公开信息,如何在不泄露样本ID交集的前提下实现数据对齐成为亟需解决的问题。基于可交换加密和同态加密技术,该文构造了隐私保护的数据对齐框架ALIGN,包括数据加密、密文盲化、密文求交和特征拼接等步骤,使得相同的原始样本ID经过双重可交换加密可变换为相同的密文,并且对样本特征经同态加密后又进行了盲化处理。ALIGN框架能够对参与方样本ID的密文求交,将交集内样本ID对应的全部特征数据进行拼接并以秘密分享形式分配给参与方。相比现有数据对齐技术,该框架不仅能够保护样本ID交集的隐私性,同时能安全地删除样本ID交集外的样本信息。对ALIGN框架的安全性证明表明,除数据规模外,各客户端不能通过数据对齐获得关于对方数据的任何信息,保证了隐私保护策略的有效性。与现有工作相比,每增加10%的冗余数据,ALIGN框架利用所得数据对齐结果可将模型训练时间缩短约1.3秒,将模型训练准确度稳定在85%以上。仿真实验结果表明,通过ALIGN框架进行纵向联邦学习数据对齐,有利于提升后续模型训练的效率和模型准确度。In vertical federated learning,the datasets of the clients have overlapping sample IDs and features of different dimensions,thus the data alignment is necessary for model training.As the intersection of the sample IDs is public in current data alignment technologies,how to align the data without any leakage of the intersection becomes a key issue.The proposed private-preserving data ALIGNment framework(ALIGN)is based on interchangeable encryption and homomorphic encryption technologies,mainly including data encryption,ciphertext blinding,private intersecting,and feature splicing.The sample IDs are encrypted twice based on an interchangeable encryption algorithm,where the same ciphertexts correspond to the same plaintexts,and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm.The intersection of the encrypted sample IDs is obtained,and the corresponding features are then spliced and secretly shared with the participants.Compared to the existing technologies,the privacy of the ID intersection is protected,and the samples corresponding to the IDs outside intersection can be removed safely in our framework.The security proof shows that each participant cannot obtain any knowledge of each other except for the data size,which guarantees the effectiveness of the private-preserving strategies.The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data.The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.

关键词：纵向联邦学习数据对齐隐私保护可交换加密同态加密

分类号：TN918[电子电信—通信与信息系统] TP309[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向纵向联邦学习的隐私保护数据对齐框架

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向纵向联邦学习的隐私保护数据对齐框架

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索