检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曹希彧 陈鑫[1] 魏同权 CAO Xi-Yu;CHEN Xin;WEI Tong-Quan(School of Computer Science and Technology,East China Normal University,Shanghai 200062)
机构地区:[1]华东师范大学计算机科学与技术学院,上海200062
出 处:《计算机学报》2024年第11期2536-2551,共16页Chinese Journal of Computers
基 金:国家自然科学基金面上项目(62272169);上海市市级科技重大专项(2021SIIZDZX);上海市可信工业互联网软件协同创新中心项目资助。
摘 要:人工智能时代,RISC-Ⅴ作为一种新兴的开源精简指令集架构,因其低功耗、模块化、开放性和灵活性等优势,使之成为一种能够适应不断发展的深度学习模型和算法的新平台.但是在硬件资源及功耗受限环境下,基础的RISC-Ⅴ处理器架构无法满足卷积神经网络对高性能计算的需求.为了解决这一问题,本文设计了一个基于RISC-Ⅴ的轻量化深度可分离卷积神经网络加速器,旨在弥补RISC-Ⅴ处理器的卷积计算能力的不足.该加速器支持深度可分离卷积中的两个关键算子,即深度卷积和点卷积,并能够通过共享硬件结构提高资源利用效率.深度卷积计算流水线采用了高效的Winograd卷积算法,并使用2×2数据块组合拼接成4×4数据片的方式来减少传输数据冗余.同时,通过拓展RISC-Ⅴ处理器端指令,使得加速器能够实现更灵活的配置和调用.实验结果表明,相较于基础的RISC-Ⅴ处理器,调用加速器后的点卷积和深度卷积计算取得了显著的加速效果,其中点卷积加速了104.40倍,深度卷积加速了123.63倍.与此同时,加速器的性能功耗比达到了8.7GOPS/W.本文的RISC-Ⅴ处理器结合加速器为资源受限环境下卷积神经网络的部署提供了一个高效可行的选择.In the era of artificial intelligence,RISC-Ⅴ,as an emerging open-source Reduced Instruction Set Computing architecture,has become a new platform capable of adapting to evolving deep learning models and algorithms due to its advantages such as low power consumption,modularity,openness,and flexibility.However,in environments with constrained hardware resources and power,the basic RISC-Ⅴ processor architecture falls short of meeting the highperformance computing demands of convolutional neural networks.To address this issue,this paper introduces a lightweight depthwise separable convolutional neural network accelerator based on RISC-Ⅴ,aiming to compensate for the insufficient convolutional computation capabilities of RISC-Ⅴ processors.The accelerator supports two key operators in depthwise separable convolution:depthwise convolution and pointwise convolution,and enhances resource utilization efficiency through shared hardware structures.The depthwise convolution computation pipeline employs an efficient Winograd convolution algorithm and reduces data redundancy by combining 2×2 data blocks into 4 × 4 data tiles.Additionally,by extending RISC-Ⅴ instructions,the accelerator achieves more flexible configuration and invocation.Experimental results demonstrate significant acceleration in pointwise and depthwise convolution computations compared to the basic RISC-Ⅴ processor,with a speedup of 104.40x for pointwise convolution and 123.63x for depthwise convolution.Meanwhile,the performance-to-power ratio of the accelerator reaches 8.7 GOPS/W.The combination of the RISC-Ⅴ processor and the accelerator presented in this paper offers an efficient and viable choice for deploying convolutional neural networks in resourceconstrained environments.
关 键 词:神经网络 深度可分离卷积 RISC-Ⅴ Winograd快速卷积 硬件加速
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7