检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李博文 谢在鹏 毛莺池[1] 徐媛媛 朱晓瑞 张基 LI Bowen;XIE Zaipeng;MAO Yingchi;XU Yuanyuan;ZHU Xiaorui;ZHANG Ji(School of Computer and Information,Hohai University,Nanjing 211100,China)
出 处:《计算机工程》2021年第4期68-76,83,共10页Computer Engineering
基 金:国家自然科学基金重点项目(61832005);国家重点研发计划(2016YFC0402710)。
摘 要:基于数据并行化的异步随机梯度下降(ASGD)算法由于需要在分布式计算节点之间频繁交换梯度数据,从而影响算法执行效率。提出基于分布式编码的同步随机梯度下降(SSGD)算法,利用计算任务的冗余分发策略对每个节点的中间结果传输时间进行量化以减少单一批次训练时间,并通过数据传输编码策略的分组数据交换模式降低节点间的数据通信总量。实验结果表明,当配置合适的超参数时,与SSGD和ASGD算法相比,该算法在深度神经网络和卷积神经网络分布式训练中平均减少了53.97%、26.89%和39.11%、26.37%的训练时间,从而证明其能有效降低分布式集群的通信负载并保证神经网络的训练精确度。The Asynchronized Stochastic Gradient Descent(ASGD)algorithm based on data parallelization require frequent gradient data exchanges between distributed computing nodes,which affects the execution efficiency of the algorithm.This paper proposes a Synchronized Stochastic Gradient Descent(SSGD)algorithm based on distributed coding.The algorithm uses the redundancy allocation strategy of computation tasks to quantify the intermediate transmission time of each node,and thus reduces the consumed time for training of a single batch.Then the amount of data transmitted between nodes is reduced by using the grouped data exchange mode of the coding strategy for data communication.Experimental results show that with a suitable hyper parameter configuration,the proposed algorithm can reduce the average distributed training time of Deep Neural Network(DNN)and Convolutional Neural Network(CNN)by 53.97%and 26.89%compared with the SSGD algorithm,and by 39.11%and 26.37%compared with the ASGD algorithm.It can significantly reduce the communication loads of the distributed cluster and ensures the training accuracy of neural networks.
关 键 词:神经网络 深度学习 分布式编码 梯度下降 通信负载
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.139.55.72