多GPU环境下的卷积神经网络并行算法  被引量:5

Parallel Algorithm of Convolutional Neural Network in Multi-GPU Environment

在线阅读下载全文

作  者:王裕民[1,2] 顾乃杰[1,2] 张孝慈[1,2] 

机构地区:[1]中国科学技术大学计算机科学与技术学院网络计算与高效算法实验室,合肥230027 [2]中国科学技术大学中科院沈阳计算所网络与通信联合实验室,合肥230027

出  处:《小型微型计算机系统》2017年第3期536-539,共4页Journal of Chinese Computer Systems

摘  要:随着深度学习的不断发展,卷积神经网络凭借其优异的识别性能,在图像识别、语音识别等领域受到了越来越多的关注.卷积神经网络的研究需要进行充分的实验,然而其训练过程通常需要大量时间.使用高性能GPU可以加速卷积神经网络的训练过程,但是由于GPU的特殊结构,进行多GPU的扩展时难以取得令人满意的加速比.提出一种在多GPU下的数据并行算法,与传统的客户机/服务器结构不同,该算法以环形结构组织GPU,更有利于多GPU扩展,系统不会受限于服务器节点的性能.此外还通过并行化单个GPU的计算与传输任务,提高GPU的使用效率.实验结果表明,使用4个GPU时,该算法分别在mnist和cifar10数据集上取得了3.77和3.79倍的加速比,并且对网络的识别性能无显著影响.With the development of deep learning,convolutional neural network has received more and more attention for its outstanding performance in the field of image recognition and speech recognition. The research of convolutional neural network needs plenty of experiments, while its training process costs immense time. High-performance GPU can be used to accelerate the training process. However it's hard to achieve desirable speed-up with multiple GPUs for the special hardware structure of GPU. This paper presents a new strategy to implement data parallel on multiple GPUs. Unlike traditional client/server structure,GPUs are organized into a ring structure,so that the system can be scaled out with more GPUs and won't be limited by the performance of server node. Besides,the usage of GPUs is increased by paralleling the compute and transport part in a single GPU. Experiments show that this algorithm can achieve 3.77 and 3.79 speed-up with 4 GPUS on mnist and cifar10 datasets respectively, and it has little side-effect on the final accuracy.

关 键 词:卷积神经网络 GPU 随机梯度下降 并行算法 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象