检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:咸琳涛 刘晓兰 王淦 刘建明[1] XIAN Lin-tao;LIU Xiao-lan;WANG Gan;LIU Jian-ming(Intelligent Medical Engineering Laboratory,Weifang Medical University,Weifang 261053,China)
机构地区:[1]潍坊医学院智能医学工程实验室,山东潍坊261053
出 处:《计算机工程与设计》2024年第9期2821-2827,共7页Computer Engineering and Design
基 金:潍坊医学院2023年校级研究课题基金项目(2023YBD005)。
摘 要:针对分布式神经网络训练在异构环境中训练速度慢、资源利用率低的问题,提出一种异构环境感知的分布式神经网络训练模型(H-PS)。根据计算节点当前状态动态调度训练任务,使计算节点能够在相同时间完成训练任务,提高资源利用率。提出通信与计算并行策略,参数服务器与计算节点传输模型参数期间,计算节点持续模型计算,进一步提高资源利用率。使用灵活的量化策略,压缩神经网络模型参数,减少参数服务器与计算节点的通信开销。使用新兴的容器集群进行实验,结果表明,与现有方法相比,H-PS训练时间缩短1.4~3.5倍。To solve the problem that the low training speed and low resource utilizing of distributed neural network training in heterogeneous environment,a heterogeneous-aware parameter server with distributed neural network training(H-PS)was proposed.The resources of each worker were fully utilized by dynamically scheduling tasks based on the current status of the workers so that the workers completed their tasks at the same time.A pipeline scheme was proposed to further improve the effectiveness of workers by continuously model training of workers during the time of parameters transmitting between parameter server and workers.A flexible quantization scheme was proposed to reduce the communication overhead between the parameter server and workers by compressing the parameters of neural network model.An emerging container cluster for experiments was used.Experimental results indicate that the proposed H-PS can reduce the overall training time by 1.4x-3.5x when compared with existing methods.
关 键 词:分布式机器学习 异构环境 任务动态规划 通信与计算并行 参数动态量化 深度神经网络 容器集群
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.178.45