检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴焕 吴俊敏[1] WU Huan;WU Jun-min(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, China)
机构地区:[1]中国科学技术大学计算机科学与技术学院,安徽合肥230000
出 处:《计算机工程与设计》2018年第12期3686-3691,共6页Computer Engineering and Design
基 金:国家重点研发计划基金项目(2016YFB1000403)
摘 要:为加速卷积神经网络的前向推理速度,提出一种针对卷积操作访存连续性的优化策略。在深度学习框架Caffe中,卷积以矩阵乘法的形式实现。Caffe卷积包含两个主要操作,分别是im2col和gemm。im2col称为image to columns,负责展开输入图像;gemm是general matrix-matrix multiplication的缩写,负责完成矩阵与矩阵之间的乘法运算。在以行优先的体系结构中,通过转置操作改变输入图像的数据排列,提升im2col和gemm的访存效率。实验结果表明,卷积操作的平均加速比在40%左右。To accelerate forward inference speed of convolutional neural network,an optimization strategy for memory access continuity was presented.In the deep learning framework Caffe,convolution was implemented in the form of matrix multiplication.Caffe convolution contained two major operations,called im2col and gemm.Im2col was known as image to columns,which was responsible for stretching out input images.Gemm was short for general matrix-matrix multiplication,which was responsible for multiplication between matrices.In row-major order architecture,input data layout was changed by transposing to promote memory access efficiency of im2col and gemm simultaneously.Experimental results show that the average speedup of convolution is about40%.
关 键 词:卷积神经网络 中央处理单元 转置 加速 访存 前向推理
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.21.222