检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王裕民[1,2] 顾乃杰[1,2] 张孝慈[1,2]
机构地区:[1]中国科学技术大学计算机科学与技术学院网络计算与高效算法实验室,合肥230027 [2]中国科学技术大学中科院沈阳计算所网络与通信联合实验室,合肥230027
出 处:《小型微型计算机系统》2017年第3期536-539,共4页Journal of Chinese Computer Systems
摘 要:随着深度学习的不断发展,卷积神经网络凭借其优异的识别性能,在图像识别、语音识别等领域受到了越来越多的关注.卷积神经网络的研究需要进行充分的实验,然而其训练过程通常需要大量时间.使用高性能GPU可以加速卷积神经网络的训练过程,但是由于GPU的特殊结构,进行多GPU的扩展时难以取得令人满意的加速比.提出一种在多GPU下的数据并行算法,与传统的客户机/服务器结构不同,该算法以环形结构组织GPU,更有利于多GPU扩展,系统不会受限于服务器节点的性能.此外还通过并行化单个GPU的计算与传输任务,提高GPU的使用效率.实验结果表明,使用4个GPU时,该算法分别在mnist和cifar10数据集上取得了3.77和3.79倍的加速比,并且对网络的识别性能无显著影响.With the development of deep learning,convolutional neural network has received more and more attention for its outstanding performance in the field of image recognition and speech recognition. The research of convolutional neural network needs plenty of experiments, while its training process costs immense time. High-performance GPU can be used to accelerate the training process. However it's hard to achieve desirable speed-up with multiple GPUs for the special hardware structure of GPU. This paper presents a new strategy to implement data parallel on multiple GPUs. Unlike traditional client/server structure,GPUs are organized into a ring structure,so that the system can be scaled out with more GPUs and won't be limited by the performance of server node. Besides,the usage of GPUs is increased by paralleling the compute and transport part in a single GPU. Experiments show that this algorithm can achieve 3.77 and 3.79 speed-up with 4 GPUS on mnist and cifar10 datasets respectively, and it has little side-effect on the final accuracy.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28