基于自适应分层梯度压缩的分布式训练通信优化方法  

Distributed Training Communication Optimization Method Based on Adaptive Hierarchical Gradient Compression

在线阅读下载全文

作  者:王晓晓 朱晓娟[1] WANG Xiaoxiao;ZHU Xiaojuan(School of Computer Science and Engineering,Anhui University of Science&Technology,Huainan 232001,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001

出  处:《湖北民族大学学报(自然科学版)》2025年第1期34-40,共7页Journal of Hubei Minzu University:Natural Science Edition

基  金:安徽省高校省级自然科学研究重点项目(KJ2020A0300)。

摘  要:针对分布式机器学习场景中,多个计算节点和参数服务器节点之间频繁传输参数和梯度导致通信开销较大、模型训练效率较低的问题,提出基于自适应分层梯度压缩(adaptive layered gradient compression, ALGC)的通信优化方法。首先,为每层神经网络设置1个合适的压缩阈值,选择性地压缩大于该阈值的层;其次,为被选择压缩的每层单独设定稀疏阈值,并动态调整该阈值,实现对每层梯度传输的自适应压缩;最后,将计算与通信重叠,利用参数服务器汇总每层的梯度和梯度残差完成对全局模型的更新。结果表明,ALGC方法的训练准确率最高可达95.07%,并且实现了最短收敛时间和最大加速比。ALGC方法在保证模型训练准确率的同时,对于提升模型训练速度和降低通信开销具有重要作用。To address the issues in the context of distributed machine learning of high communication overhead and low model training efficiency caused by frequent transmission of parameters and gradients between multiple computing nodes and parameter server nodes,a communication optimization method based on adaptive layered gradient compression(ALGC)was proposed.Firstly,an appropriate compression threshold was set for each layer of the neural network,and layers exceeding this threshold were selectively compressed.Secondly,a sparse threshold was separately set for each layer selected for compression and dynamically adjusted to achieve adaptive compression of gradient transmission for each layer.Finally,computation and communication were overlapped,and the parameter server aggregates the gradients and gradient residuals of each layer to update the global model.The results showed that the training accuracy of the ALGC method could reach up to 95.07%,and it achieved the minimum convergence time and the maximum speedup ratio.The ALGC method played a significant role in improving the model training speed and reducing communication overhead while ensuring the model training accuracy.

关 键 词:分布式机器学习 梯度压缩 参数服务器 稀疏化 通信优化 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象