MACO:基于访存视角的卷积网络自动代码优化

MACO:memory-based automatic code optimization of CNNs

作　　者：张晓扬肖俊敏姚家树谭光明[1] ZHANG Xiaoyang;XIAO Junmin;YAO Jiashu;TAN Guangming(High Performance Computer Research Center,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区：[1]中国科学院计算技术研究所高性能计算机研究中心,北京100190 [2]中国科学院大学,北京100049

出　　处：《高技术通讯》2023年第12期1253-1264,共12页Chinese High Technology Letters

基　　金：国家自然科学基金(62172391,61972377,62032023,T2125013);北京市科技计划(Z231100007423002)资助项目。

摘　　要：推理自动优化一直是人工智能(AI)与系统结构领域交叉的研究重点,但以访存为出发点的自动优化研究方案较少。本文从全局和局部两方面出发,针对数据布局和内核的自动优化问题,以访存的视角对卷积神经网络(CNN)自动代码优化中优化时间成本过高的问题进行研究。为有效分析访存,本文改进了经典的红蓝卵石访存模型的建模方法,提出了新的I/O下界估计方法,降低了多阶段复合算法的下界估计难度,并基于改进后的模型估计了卷积的I/O下界。根据卷积下界估计的结论,本文对数据流进行合理设计,有针对性地优化了自动模板生成技术下巨大的搜索空间,避免了大量无效搜索过程,使内核搜索效率较在未经优化的搜索空间中得到显著加速,并在一般性的卷积参数下较cuDNN有平均2.24倍的性能提升,保证了内核性能。同时本文借助神经网络实现了不同数据布局下的卷积性能预测,R2得分高于传统机器学习模型,且在ResNet-18、AlexNet和VGG-11模型中采用基于数据布局回溯算法和预测模型的混合布局策略较默认布局策略分别有1.28倍、1.32倍和1.29倍的性能提升。Inference automatic optimization has been the focus of research at the intersection of artificial intelligence(AI)and system architecture fields.However,there are fewer optimization research schemes based on memory.In this paper,the high time cost of automatic optimization of convolutional neural networks(CNN)data layout and kernel is studied and discussed from the perspective of memory from both global and local aspects.To perform the access analysis efficiently,the classical red-blue pebble game is re-explored and a new method is proposed to estimation I/O lower bound which reduces the difficulty of lower bound estimation for multi-stage composite algorithms.This work analyses the convolutional I/O lower bound based on the improved model and re-designs the data flow with the estimated results.This work purposefully optimizes the huge search space under the auto-template generation technique to avoid a large number of invalid search processes,so that the kernel search efficiency is significantly accelerated compared with the unoptimized search space,and the performance is improved by an average of 2.24×compared with cuDNN with general convolutional parameters,which ensures the kernel performance.This work also implements the convolutional performance prediction under different data layouts with the help of neural networks,and the R2 score is higher than that of traditional machine learning models.The performance of the hybrid layout strategy based on data layout backtracking algorithm and prediction model has 1.28×,1.32×,and 1.29×improvement over the default layout strategy in ResNet-18,AlexNet,and VGG-11 models,respectively.

关键词：内存优化人工智能(AI) 推理数据布局自动调优

分类号：TP18[自动化与计算机技术—控制理论与控制工程] TP311.52[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MACO:基于访存视角的卷积网络自动代码优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MACO:基于访存视角的卷积网络自动代码优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索