一种解决数据异构问题的联邦学习方法  被引量:6

Effective method to solve problem of data heterogeneity in federated learning

在线阅读下载全文

作  者:张红艳 张玉 曹灿明 Zhang Hongyan;Zhang Yu;Cao Canming(College of Information Science&Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Computer Science&Technology,Tiangong University,Tianjin 300387,China)

机构地区:[1]郑州师范学院信息科学与技术学院,郑州450044 [2]天津工业大学计算机科学与技术学院,天津300387

出  处:《计算机应用研究》2024年第3期713-720,共8页Application Research of Computers

基  金:国家自然科学基金资助项目(61972456,62172298)。

摘  要:联邦学习是一种不通过中心化的数据训练就能获得机器学习模型的系统,源数据不出本地,降低了隐私泄露的风险,同时本地也获得优化训练模型。但是由于各节点之间的身份、行为、环境等不同,导致不平衡的数据分布可能引起模型在不同设备上的表现出现较大偏差,从而形成数据异构问题。针对上述问题,提出了基于节点优化的数据共享模型参数聚类算法,将聚类和数据共享同时应用到联邦学习系统中,该方法既能够有效地减少数据异构对联邦学习的影响,也加快了本地模型收敛的速度。同时,设计了一种评估全局共享模型收敛程度的方法,用于判断节点聚类的时机。最后,采用数据集EMNIST、CIFAR-10进行了实验和性能分析,验证了共享比例大小对各个节点收敛速度、准确率的影响,并进一步分析了当聚类与数据共享同时应用到联邦学习前后各个节点的准确率。实验结果表明,当引入数据共享后各节点的收敛速度以及准确率都有所提升,而当聚类与数据共享同时引入到联邦学习训练后,与FedAvg算法对比,其准确度提高10%~15%,表明了该方法针对联邦学习数据异构问题上有着良好的效果。Federated learning is a framework for obtaining machine learning models without centralized data training,reducing the risk of privacy leakage while also obtaining optimized training models locally.However,the identity,behavior,environment,etc.between nodes are different,resulting in unbalanced data distribution,which may cause a large deviation in the performance of the model on different devices,resulting in data heterogeneity.Aiming at the above problems,this paper proposed a federated learning algorithm for data sharing clustering based on node optimization method that applied clustering and data sharing to fede-rated learning system at the same time,which could effectively reduce the impact of data heterogeneity on federated learning and accelerate the convergence of local models.At the same time,it designed method to assess the convergence of the global shared model to determine the timing of node clustering nodes.Finally,this paper used the EMNIST and CIFAR-10 datasets for experiments and performance analysis to compare the effects of the size of the shared scale on the convergence speed and accuracy of each node,and to compare the accuracy of clustering and data sharing before and after the application of federated learning.Experimental results show that the convergence speed and accuracy of each node are improved when data sharing is introduced,and the accuracy is increased by about 10%~15%when clustering and data sharing are introduced into federated learning training at the same time,indicating that this method has a good effect on the heterogeneous problem of federated learning data.

关 键 词:联邦学习 数据共享 聚类 全局共享模型收敛 数据异构 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象