检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:姬晨晨 陈永青 韩孟之 JI Chenchen;CHEN Yongqing;HAN Mengzhi(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,Henan,China;Dawning Information Industry(Beijing)Co.,Ltd.,Beijing 100193,China)
机构地区:[1]郑州大学计算机与人工智能学院,河南郑州450000 [2]曙光信息产业(北京)有限公司,北京100193
出 处:《计算机工程》2025年第2期250-258,共9页Computer Engineering
基 金:国家重点研发计划(2021YFB0300200)。
摘 要:目前三维卷积神经网络(3D CNN)的应用场景越来越广泛,其能够从原始数据中提取更丰富、更具判别性的特征信息,在处理3D数据、特征提取和实际应用等方面具有重要意义。然而,从二维(2D)数据到3D数据的转变导致了卷积运算的数据量和计算量均呈指数级增长,对计算资源和时间的需求也相应增加,这会导致训练和推理过程更加耗时,特别是在处理大规模3D数据时尤为明显。针对以上问题,提出一种基于国产加速器的隐式卷积算法,对3D卷积的前向计算过程进行优化。首先,该算法结合了硬件特点和并行化思路,利用索引直接访问所需计算的数据地址,无须开辟新的内存空间,大幅节省内存开销;其次,考虑到国产加速器具有高度并行的计算结构和丰富的计算资源,适合处理大规模数据和复杂的计算任务,结合国产加速器的计算能力和架构特点,采用一系列特定的异构并行优化算法,加速3D卷积前向算子的计算过程,提高计算效率和性能。实验结果表明,自研算子性能远超国产计算平台现有算子的最优性能,在多数情况下与NVIDIA V100之间的能效比可以达到70%甚至更高。The current application scenarios of a three-dimensional(3D)Convolutional Neural Network(3D CNN)are increasingly extensive.3D CNN can extract richer and more discriminative feature information from the original data,which is crucial in processing 3D data,feature extraction,and practical applications.However,the shift from two-dimensional(2D)to 3D data has exponentially increased both the amount of data and computation required for convolution operations,thus increasing computational resources and time.This can lead to more time-consuming training and inference processes,particularly when dealing with large-scale 3D data.To solve these problems,this study proposes an implicit convolution algorithm based on a domestic accelerator to optimize the forward calculation process of 3D convolution.First,the algorithm combines hardware characteristics and parallelization idea,by using an index to directly access the required data address without allocating additional memory space,thereby considerably reducing the memory overhead.Second,the domestic accelerator has a highly parallel computing structure and rich computing resources,which are suitable for processing large-scale data and complex computing tasks.Finally,using various specific heterogeneous parallel optimization algorithms combined with computing power and architecture characteristics of domestic accelerators significantly accelerates the computational process of 3D convolutional forward operators and improves computational efficiency and performance.The experimental results indicate that the performance of the self-developed operators significantly exceeds the optimal performance of existing domestic computing platform operators,and the energy efficiency ratio with NVIDIA V100 can basically reach 70%or higher.
关 键 词:三维卷积 国产加速器 隐式卷积算法 索引机制 前向算子优化 并行优化算法
分 类 号:TP338.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.185.32