基于国产加速器的三维卷积前向算子优化

Optimization of 3D Convolutional Forward Operators Based on Domestic Accelerators

作　　者：姬晨晨陈永青韩孟之 JI Chenchen;CHEN Yongqing;HAN Mengzhi(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,Henan,China;Dawning Information Industry(Beijing)Co.,Ltd.,Beijing 100193,China)

机构地区：[1]郑州大学计算机与人工智能学院,河南郑州450000 [2]曙光信息产业(北京)有限公司,北京100193

出　　处：《计算机工程》2025年第2期250-258,共9页Computer Engineering

基　　金：国家重点研发计划(2021YFB0300200)。

摘　　要：目前三维卷积神经网络(3D CNN)的应用场景越来越广泛,其能够从原始数据中提取更丰富、更具判别性的特征信息,在处理3D数据、特征提取和实际应用等方面具有重要意义。然而,从二维(2D)数据到3D数据的转变导致了卷积运算的数据量和计算量均呈指数级增长,对计算资源和时间的需求也相应增加,这会导致训练和推理过程更加耗时,特别是在处理大规模3D数据时尤为明显。针对以上问题,提出一种基于国产加速器的隐式卷积算法,对3D卷积的前向计算过程进行优化。首先,该算法结合了硬件特点和并行化思路,利用索引直接访问所需计算的数据地址,无须开辟新的内存空间,大幅节省内存开销;其次,考虑到国产加速器具有高度并行的计算结构和丰富的计算资源,适合处理大规模数据和复杂的计算任务,结合国产加速器的计算能力和架构特点,采用一系列特定的异构并行优化算法,加速3D卷积前向算子的计算过程,提高计算效率和性能。实验结果表明,自研算子性能远超国产计算平台现有算子的最优性能,在多数情况下与NVIDIA V100之间的能效比可以达到70%甚至更高。The current application scenarios of a three-dimensional(3D)Convolutional Neural Network(3D CNN)are increasingly extensive.3D CNN can extract richer and more discriminative feature information from the original data,which is crucial in processing 3D data,feature extraction,and practical applications.However,the shift from two-dimensional(2D)to 3D data has exponentially increased both the amount of data and computation required for convolution operations,thus increasing computational resources and time.This can lead to more time-consuming training and inference processes,particularly when dealing with large-scale 3D data.To solve these problems,this study proposes an implicit convolution algorithm based on a domestic accelerator to optimize the forward calculation process of 3D convolution.First,the algorithm combines hardware characteristics and parallelization idea,by using an index to directly access the required data address without allocating additional memory space,thereby considerably reducing the memory overhead.Second,the domestic accelerator has a highly parallel computing structure and rich computing resources,which are suitable for processing large-scale data and complex computing tasks.Finally,using various specific heterogeneous parallel optimization algorithms combined with computing power and architecture characteristics of domestic accelerators significantly accelerates the computational process of 3D convolutional forward operators and improves computational efficiency and performance.The experimental results indicate that the performance of the self-developed operators significantly exceeds the optimal performance of existing domestic computing platform operators,and the energy efficiency ratio with NVIDIA V100 can basically reach 70%or higher.

关键词：三维卷积国产加速器隐式卷积算法索引机制前向算子优化并行优化算法

分类号：TP338.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于国产加速器的三维卷积前向算子优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于国产加速器的三维卷积前向算子优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索