检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张晓扬 肖俊敏 姚家树 谭光明[1] ZHANG Xiaoyang;XIAO Junmin;YAO Jiashu;TAN Guangming(High Performance Computer Research Center,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
机构地区:[1]中国科学院计算技术研究所高性能计算机研究中心,北京100190 [2]中国科学院大学,北京100049
出 处:《高技术通讯》2023年第12期1253-1264,共12页Chinese High Technology Letters
基 金:国家自然科学基金(62172391,61972377,62032023,T2125013);北京市科技计划(Z231100007423002)资助项目。
摘 要:推理自动优化一直是人工智能(AI)与系统结构领域交叉的研究重点,但以访存为出发点的自动优化研究方案较少。本文从全局和局部两方面出发,针对数据布局和内核的自动优化问题,以访存的视角对卷积神经网络(CNN)自动代码优化中优化时间成本过高的问题进行研究。为有效分析访存,本文改进了经典的红蓝卵石访存模型的建模方法,提出了新的I/O下界估计方法,降低了多阶段复合算法的下界估计难度,并基于改进后的模型估计了卷积的I/O下界。根据卷积下界估计的结论,本文对数据流进行合理设计,有针对性地优化了自动模板生成技术下巨大的搜索空间,避免了大量无效搜索过程,使内核搜索效率较在未经优化的搜索空间中得到显著加速,并在一般性的卷积参数下较cuDNN有平均2.24倍的性能提升,保证了内核性能。同时本文借助神经网络实现了不同数据布局下的卷积性能预测,R2得分高于传统机器学习模型,且在ResNet-18、AlexNet和VGG-11模型中采用基于数据布局回溯算法和预测模型的混合布局策略较默认布局策略分别有1.28倍、1.32倍和1.29倍的性能提升。Inference automatic optimization has been the focus of research at the intersection of artificial intelligence(AI)and system architecture fields.However,there are fewer optimization research schemes based on memory.In this paper,the high time cost of automatic optimization of convolutional neural networks(CNN)data layout and kernel is studied and discussed from the perspective of memory from both global and local aspects.To perform the access analysis efficiently,the classical red-blue pebble game is re-explored and a new method is proposed to estimation I/O lower bound which reduces the difficulty of lower bound estimation for multi-stage composite algorithms.This work analyses the convolutional I/O lower bound based on the improved model and re-designs the data flow with the estimated results.This work purposefully optimizes the huge search space under the auto-template generation technique to avoid a large number of invalid search processes,so that the kernel search efficiency is significantly accelerated compared with the unoptimized search space,and the performance is improved by an average of 2.24×compared with cuDNN with general convolutional parameters,which ensures the kernel performance.This work also implements the convolutional performance prediction under different data layouts with the help of neural networks,and the R2 score is higher than that of traditional machine learning models.The performance of the hybrid layout strategy based on data layout backtracking algorithm and prediction model has 1.28×,1.32×,and 1.29×improvement over the default layout strategy in ResNet-18,AlexNet,and VGG-11 models,respectively.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28