基于多GPU的深度神经网络训练算法  被引量:8

Algorithm of Depth Neural Network Training Based on Multi-GPU

在线阅读下载全文

作  者:顾乃杰[1,2] 赵增[1,2] 吕亚飞 张致江 

机构地区:[1]中国科学与技术大学计算机科学与技术学院网络计算与高校算法实验室,合肥230027 [2]中国科学技术大学中科院沈阳计算所网络与通信联合实验室,合肥230027 [3]科大讯飞,合肥230027

出  处:《小型微型计算机系统》2015年第5期1042-1046,共5页Journal of Chinese Computer Systems

基  金:核高基重大专项项目(2009ZX01028-002-003-005)资助

摘  要:深度学习由于出色的识别效果在模式识别及机器学习领域受到越来越多的关注.作为深度神经网络算法的重要组成部分,误差反向传播算法的执行效率已经成为制约深度学习领域发展的瓶颈.提出一种基于Tesla K10 GPU的误差反向传播算法,该算法具有负载均衡,可扩展性高的特点.本算法充分利用PCI-E3.0传输特性,并结合peer-to-peer以及异步传输的特性以降低计算任务在划分和合并过程中带来的额外开销.除此之外,文章通过对算法流程的重构,实现算法数据相关性的解耦合,从而使得有更多的计算任务可用来掩盖传输过程.实验证明,该算法拥有双卡超过1.87的并行加速比,且算法执行过程中不会引入计算误差,可有效保证训练过程中的收敛效率,拥有理想的并行加速效果.In recent years, deep learning has received more and more attention. It greatly improves the recognition rate of speech and images. As an important part of Depth Neural Network, the efficiency of back-propagation training has been the major roadblock. This paper present an improved parallel algorithm of back-propagation training based on Tesla K10 GPU. The improved algorithm has the characteristics of load balancing and high scalability. It full advantages the features of PCI-E 3. 0, uses the asynchronous transfer mode and peer-to-peer to improve the performance of data transmission. Apart from this, this paper reduced the data related by reconstructing the algorithm processes of back-propagation training. In this way, the new algorithm has more computation which can be used to conceal the data transmission. Experiments show that the improved algorithm can achieve a 1.87 end-to-end speed-up. And no errors will be introduced by this algorithm. It is better than the most parallel algorithm of back-propagation based on GPGPU computing platform.

关 键 词:深度学习 神经网络 GPGPU 并行算法 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象