基于三维分块矩阵的卷积优化算法  

Convolution optimization algorithm based on 3D block matrices

作  者:吴丰桓 唐春明[1] WU Feng-huan;TANG Chun-ming(School of Mathematics and Information Science,Guangzhou University,Guangzhou 510006,China)

机构地区:[1]广州大学数学与信息科学学院,广东广州510006

出  处:《广州大学学报(自然科学版)》2025年第1期9-20,共12页Journal of Guangzhou University:Natural Science Edition

基  金:国家自然科学基金资助项目(12171114)。

摘  要:卷积是卷积神经网络的关键组成部分,其性能对网络的运行效率具有重要影响。目前对卷积的优化方法集中在计算速度和内存使用两方面。MEC算法是一种内存高效的卷积加速方法,将输入图像紧凑地排列为二维矩阵,降低中间矩阵的内存开销。然而,在处理大尺寸输入时,生成多个瘦高的二维分块矩阵,无法充分发挥矩阵乘法的峰值性能,导致计算效率下降。文章提出了一种基于三维分块矩阵的卷积优化算法CMEC:①采用三维窗口在原始图像上滑动获取数据,将输入和卷积核重新组织为三维中间矩阵;②并行计算输入分块矩阵与卷积核的三维矩阵乘法,利用高度优化的矩阵加速库提升计算速度;③将计算结果转换为标准的卷积输出形式。实验结果表明,与MEC算法相比,CMEC算法具有相同的中间矩阵内存使用,但是在CPU上计算单个卷积层的平均性能提升了61%,在GPU上性能最高提升71%,在多层卷积神经网络中至少获得56%的性能提升。Convolution is the core component of convolutional neural networks,and its performance significantly impacts the network's efficiency.Current convolution optimization methods focus on both computational speed and memory usage.By compactly organizing the input image into two-dimensional matrices,the MEC approach reduces the intermediate matrix's memory overhead and is a memory-efficient convolution acceleration technique.However,In the processing of large-scale inputs,generating multiple tall and narrow two-dimensional block matrices fails to fully exploit the peak performance of matrix multiplication,resulting in decreased computational efficiency.This paper proposes a convolutional optimization algorithm CMEC based on three-dimensional block matrices.First,data is acquired by sliding a three-dimensional window across the original image,and rearranging the input image and kernel into three-dimensional intermediate matrices.Further,the input block matrix and kernel matrix are multiplied in parallel,and a highly optimized matrix acceleration library is utilized to enhance the computational speed.Finally,the computational results are converted to the standard output format.The experimental results show that,compared with the MEC algorithm,the CMEC algorithm has the same memory usage of the intermediate matrix,but achieves an average performance improvement of 61%on the CPU for computing a single convolutional layer,up to 71%on the GPU,and obtains at least 50%performance improvement in the convolutional neural network.

关 键 词:卷积优化 三维分块矩阵 数据重排 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象