检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐敏 张宇浩 邓国强 TANG Min;ZHANG Yuhao;DENG Guoqiang(Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation,School of Mathematics and Computing Science,Guilin University of Electronic Technology,Guilin 541004,Guangxi,China)
机构地区:[1]桂林电子科技大学数学与计算科学学院,广西高校数据分析与计算重点实验室,广西桂林541004
出 处:《计算机工程》2023年第4期32-42,51,共12页Computer Engineering
基 金:广西科技基地和人才专项(AD18281024);桂林电子科技大学研究生教育创新计划项目(2022YCXS144)。
摘 要:逻辑回归作为一种典型的机器学习算法,被广泛应用于医疗诊断、金融预测等领域。由于单个用户没有足够的样本构建高精度模型,传统的集中式训练则会导致隐私泄露,因此构建具有隐私保护的逻辑回归模型受到广泛关注。现有的要求用户和服务器之间进行交互的方案具有较高的计算成本和通信负担。提出一种高效的非交互式逻辑回归训练协议,利用具有良可分离结构的梯度更新公式,解耦样本数据和模型参数之间的计算耦合性,保证用户与服务器之间的单向单次传输性,即用户将本地数据整合并以秘密共享的方式上传给云服务器后即可离线。在训练阶段设计基于矩阵和向量运算的协议,保证服务器在每次迭代中使用固定的信息更新参数,降低计算成本和通信开销。同时,基于协议的安全性分析和数值实验,在UCI库的4个真实数据集上训练逻辑回归模型,实验结果表明,在保证模型精度的前提下,与最新的隐私保护逻辑回归方案VANE相比,该回归模型效率提升了80~120倍,且训练时间与明文域相近。As a typical machine learning algorithm,logistic regression is widely used in medical diagnosis,financial forecasting and other fields.Since a single user does not have enough samples to build a high-precision model,and the traditional centralized training will lead to privacy leakage,building a logistic regression model with privacy preserving has attracted extensive attention.The existing schemes that require communication between users and servers lead to high computing costs and communication burden.This paper proposes an efficient non-interactive logistic regression training protocol.Using the gradient update formula with a well-separable structure,the computational coupling between sample data and model parameters is decoupled to ensure one-direction single transmission between users and servers.That is,users can go offline after integrating local data and uploading it to the cloud servers in a secret sharing manner;In the training phase,a protocol based on matrix and vector operation is designed to ensure that the server uses fixed information update parameters in each iteration,reducing the calculation cost and communication overhead.Meanwhile,the protocol security analysis and numerical experiments are provided.The experimental results of training the logistic regression model on four real datasets from the UCI library show that,under the premise of ensuring the accuracy of the model,the efficiency is greatly improved(80-120 times)compared with the latest privacy preserving logistic regression scheme VANE,and the training time is similar to that in the plaintext domain.
关 键 词:逻辑回归 隐私保护 良可分离结构 秘密共享 向量化
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49