Winograd转置卷积快速实现方法研究  

Research on the fast implementation method of Winograd transposed convolution

在线阅读下载全文

作  者:李钊[1] 黄程程 何益智 苏晓杰[2] LI Zhao;HUANG Chengcheng;HE Yizhi;SU Xiaojie(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China;School of Automation,Chongqing University,Chongqing 400044,China)

机构地区:[1]山东理工大学计算机科学与技术学院,山东淄博255000 [2]重庆大学自动化学院,重庆400044

出  处:《西安电子科技大学学报》2023年第6期148-160,共13页Journal of Xidian University

基  金:国家重点研发计划(2022YFE0107300);山东省高等学校青年创新团队发展计划(2019KJN048)。

摘  要:Winograd转置卷积算法是现场可编程门阵列中广泛使用的卷积加速方法,可通过分组后执行Winograd卷积来解决转置卷积的零填充问题。然而该方法需要对输入特征映射和卷积核进行分组运算,且需要对运算结果进行重组,以生成完整的输出特征映射,复杂的元素坐标计算增加了设计的复杂度。针对上述问题,提出一种采用统一转换矩阵计算Winograd转置卷积的方法,使用统一的转换矩阵代替对输入特征映射和卷积核进行分组,有效解决了重叠求和、零填充、卷积核翻转、分解和重组等问题。并在该方法的指导下,结合数据重用、双缓冲区设计和流水线等方法,完成了现场可编程门阵列上转置卷积的加速器的设计。选择高斯-泊松生成对抗网络进行实验验证,并与主流的转置卷积设计方法进行了综合比较。实验结果表明,提出的方法可有效降低资源消耗和功耗,加速器的有效性能比现有的转置卷积方法提高了约1.13至23.92倍。The Winograd transposed convolution algorithm is a widely used convolution acceleration method for Field Programmable Gate Array(FPGA).It can solve the zero-padding problem of transposed convolution by performing the Winograd convolution after grouping.However,this method requires grouping operation on the input feature map and convolution kernel,and needs to reorganize the operation results to generate a complete output feature map.The complex calculation of element coordinates increases the difficulty of design.To solve the above problems,a Winograd transposed convolution method based on the unified transformation matrix is proposed,which uses the unified transformation matrix instead of grouping the input feature map and convolution kernel,and effectively solves the problems of overlapping summation,zero padding,convolution kernel inversion,decomposition and reorganization.And under the guidance of the Winograd transpose convolution method based on the unified transformation matrix,combined with data reuse,the double buffer and the pipeline,the design of a transposed convolution accelerator on FPGA is completed.The Gaussian-Poisson generative adversarial network is selected for experimental verification,and compared with the mainstream transposed convolution method.Experimental results show that the proposed method can effectively reduce the resource consumption and power consumption,and that the effective performance of the accelerator is 1.13x~23.92x higher than that of the existing transposed convolution methods.

关 键 词:统一转换矩阵 Winograd转置卷积 现场可编程门阵列 加速器 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象