检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄程程 董霄霄 李钊[1] HUANG Chengcheng;DONG Xiaoxiao;LI Zhao(School of Computer Science and Technology,Shandong University of Technology,Zibo Shandong 255049,China)
机构地区:[1]山东理工大学计算机科学与技术学院,山东淄博255049
出 处:《计算机应用》2021年第8期2258-2264,共7页journal of Computer Applications
基 金:山东省自然科学基金资助项目(ZR2018LF002);山东省高等学校青年创新团队发展计划项目(2019KJN048);淄博市校城融合发展计划项目(2018ZBXC021)。
摘 要:针对二维Winograd卷积算法中存储器带宽需求过高、计算复杂度高、设计探索周期漫长、级联的卷积存在层间计算延迟等问题,提出一种基于二维Winograd算法的双缓冲区5×5卷积层设计方法。首先使用列缓冲结构完成数据布局,以重用相邻分块之间的重叠数据,降低存储器带宽需求;然后精确搜索并复用Winograd算法加法计算过程中重复的中间计算结果,来降低加法运算量,从而减小加速器系统的能耗开销和设计面积;最后根据Winograd算法计算过程来完成6级流水线结构的设计,并实现针对5×5卷积的高效率计算。实验结果表明,这种5×5卷积的计算方法在基本不影响卷积神经网络(CNN)预测准确率的前提下,与传统卷积相比降低了83%的乘法运算量,加速倍率为5.82;该方法与级联3×3二维Winograd卷积组成5×5卷积的方法相比降低了12%的乘法运算量,降低了约24.2%的存储器带宽需求,并减少了20%的运算时间。Aiming at problems such as high memory bandwidth demand,high computational complexity,long design and exploration cycle,and inter-layer computing delay of cascade convolution in two-dimensional Winograd convolution algorithm,a double-buffer 5×5 convolutional layer design method based on two-dimensional Winograd algorithm was proposed.Firstly,the column buffer structure was used to complete the data layout,so as to reuse the overlapping data between adjacent blocks and reduce the memory bandwidth demand.Then,the repeated intermediate calculation results in addition process of Winograd algorithm were precisely searched and reused to reduce the computational cost of addition,so that the energy consumption and the design area of the accelerator system were decreased.Finally,according to the calculation process of Winograd algorithm,the design of 6-stage pipeline structure was completed,and the efficient calculation for 5×5 convolution was realized.Experimental results show that,on the premise that the prediction accuracy of the Convolutional Neural Network(CNN)is basically not affected,this calculation method of 5×5 convolution reduces the multiplication computational cost by 83%compared to the traditional convolution,and has the acceleration ratio of 5.82;compared with the method of cascading 3×3 two-dimensional Winograd convolutions to generate 5×5 convolutions,the proposed method has the multiplication computational cost reduced by 12%,the memory bandwidth demand decreased by about 24.2%,and the computing time reduced by 20%.
关 键 词:卷积神经网络 现场可编程逻辑门阵列 Winograd算法 双缓冲区 深流水线
分 类 号:TP302.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170