检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王晓晓 朱晓娟[1] WANG Xiao-xiao;ZHU Xiao-juan(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China)
机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001
出 处:《辽东学院学报(自然科学版)》2024年第4期283-290,共8页Journal of Liaodong University:Natural Science Edition
基 金:安徽省高校自然科学研究重点项目(KJ2020A0300)。
摘 要:分布式机器学习中的资源异构和资源不稳定性易造成掉队问题,使并行策略难以平衡同步滞后和过时梯度,导致同步开销较高,降低了模型的整体训练效率。因此,提出一种面向分布式机器学习的自适应同步并行策略。首先,利用计算节点参数版本和训练延迟时间识别掉队节点;其次,通过参数服务器比较最新、最旧参数的版本差和阈值,判断出计算节点所处状态;最后,基于小批量随机梯度下降算法,采用不同全局模型参数更新规则自适应调节不同状态的计算节点。实验结果表明,相较于其他并行策略,所提策略的收敛时间减少了9.61%~41.15%,准确率最高提升了3.29%。In distributed machine learning,the straggle problem caused by resource heterogeneity and resource instability leads to high synchronization overhead and reduces the overall model training efficiencyThe existence of stragglers makes it difficult for existing parallel strategies to balance the effects of synchronization lag and stale gradient To solve this problem,an adaptive synchronous parallel strategy for distributed machine learning is proposed Firstly,the stragglers are identified by the version of compute node parameters and the training delay time Secondly,the parameter server determines the status of the compute node by comparing the version difference of the latest and oldest parameters and the size of the threshold Finally,based on the small-batch stochastic gradient descent algorithm,different global model parameter update rules are adapted to compute nodes in different statesThe experimental results show that,compared to other parallel strategies,the convergence time of the proposed method is reduced by 961%~41.15%,and the accuracy of the proposed method is improved by 3.29%.
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.226.159.125