检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱光宇 谢在鹏 朱跃龙[1] ZHU Guang-yu;XIE Zai-peng;ZHU Yue-long(College of Computer and Information,Hohai University,Nanjing 211100,China)
出 处:《小型微型计算机系统》2020年第11期2249-2255,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金重点项目(61832005)资助;国家重点研发课题项目(2016YFC0402710)资助.
摘 要:深度神经网络在多个领域应用广泛,但随着数据量的增长以及模型复杂度的提高,造成的影响是训练效率和模型精度的下降,对于深度神经网络的并行化研究可以有效解决这一问题.在现有分布式环境下进行数据并行化训练是神经网络并行化的一种有效方案,但其存在全局模型精度不佳、节点计算能力不平衡的问题.针对以上问题,本文提出了一种基于差分进化改进的深度神经网络并行化方法DE-DNN.DE-DNN利用差分进化方法对并行训练过程中获取全局模型的关键步骤进行改进和优化;同时提出一种基于批处理的自适应数据分配算法BSDA,减少并行训练过程中由于计算节点能力不平衡而造成的节点额外等待时间.实验基于NiN深度网络模型对本文提出的方法进行了实现并在CIFAR-10和CIFAR-100数据集上进行测试.实验结果表明,DE-DNN可以有效提高并行训练过程中全局模型的分类准确率,加快收敛速度;BSDA数据分配算法能够合理根据各节点的计算能力分配适量数据,减少训练过程中因节点等待产生的额外时间开销.Deep neural networks are widely used in various fields.However,the growing data and the increase of model complexity lead to the decline of training efficiency and model accuracy.The parallel implementation of deep neural network can effectively mitigate this problem.Data parallelism can be an effective scheme of neural network parallelization in the existing distributed environment,but it can incur poor global model accuracy and unbalanced load on computing nodes.Targeting the issues mentioned above,this paper proposes a deep neural network parallelization scheme,DE-DNN,based on the differential evolution algorithm.DE-DNN can improve the key steps of obtaining the global model in parallel training process using the differential evolution method.Meantime,a batch-based self-adaptive data allocation(BSDA)algorithm is proposed to reduce the waiting time of computing nodes caused by the imbalance of load in the parallel training process.The method proposed in this paper is implemented based on the NiN deep network model and tested on CIFAR-10 and CIFAR-100 data sets.Experimental results show that DE-DNN can effectively improve the global model in the process of parallel training and accelerate the convergence speed;the BSDA algorithm can reasonably allocate appropriate amount of data according to the load capacity of each computing node,and reduce the additional waiting time of nodes in the process of training.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90