检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郑恩 白林亭 文鹏程[1,2] ZHENG En;BAI Lin-ting;WEN Peng-cheng(Xi′an Aeronautics Computing Technique Research Institute,AVIC,Xi′an 710000,China;Key Laboratory of Airborne Missile-borne Computer Aeronautical Science and Technology,Xi′an 710000,China)
机构地区:[1]航空工业西安航空计算技术研究所,陕西西安710000 [2]机载弹载计算机航空科技重点实验室,陕西西安710000
出 处:《航空计算技术》2024年第3期38-41,47,共5页Aeronautical Computing Technique
基 金:航空科学基金项目资助(2022Z071031001)。
摘 要:在深度学习推理框架中,GEMM是典型的计算密集型算子,在Bert、Transformer、Yolo等模型的模块中存在大量GEMM运算,会直接影响模型的推理延时。针对该算子的优化问题,分别采用循环展开、OpenMP、NEON指令集等方法进行优化,在国产嵌入式板卡飞腾D2000、国产操作系统进行实验测试。实验结果表明优化后比优化前加速43.89倍,优化方法加速效果行之有效,可以大大降低人工智能模型在边缘端的推理延时。In the deep learning inference framework,GEMM is a typical calculation-intensive operator.For example,there are a large number of GEMM operations in the modules of Bert,Transformer,Yolo and other models.Therefore,the quality of the underlying implementation of the GEMM operator in the deep learning framework will directly It affects the inference delay of the model.Due to the limited computing power of the edge embedded platform,optimizing this operator is crucial.The main work of this article is to perform embedded optimization on it,using loop expansion,OpenMP,NEON instruction set and other methods for optimization.Experimental tests were conducted on the domestic embedded board Feiteng D2000 and the domestic operating system.The experimental results show that the operator is optimized after The acceleration is 43.89 times faster than before optimization.The acceleration effect of this optimization method is effective and can greatly reduce the inference delay of the artificial intelligence model at the edge.
关 键 词:推理框架 GEMM OPENMP NEON 飞腾D2000
分 类 号:V247[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70