检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王彬燏 杨志家 谢闯[2,3] 连莲 王颖[1,2,3] WANG Binyu;YANG Zhijia;XIE Chuang;LIAN Lian;WANG Ying(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;Key Laboratory of Networked Control Systems,Chinese Academy of Sciences,Shenyang 110016,China;Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China)
机构地区:[1]沈阳化工大学信息工程学院,辽宁沈阳110142 [2]中国科学院网络化控制系统重点实验室,辽宁沈阳110016 [3]中国科学院沈阳自动化研究所,辽宁沈阳110016
出 处:《微电子学与计算机》2024年第7期89-95,共7页Microelectronics & Computer
基 金:国家重点研发计划(2022YFB3204501)。
摘 要:传统卷积神经网络(Convolutional Neural Network,CNN)专用加速器在实现卷积层算子重构、数据复用和计算资源复用时,会产生硬件资源利用率较低的问题。对此设计了一种基于动态寄存器堆和可重构PE阵列相结合的硬件架构,通过优化数据流使得各PE单元负载均衡,进而提高卷积层计算资源的利用率。可灵活部署0~11大小和1~10步长的奇数卷积核,支持多通道并行卷积、输入数据复用等操作。设计使用verilog硬件描述语言实现,通过创建UVM环境进行功能性验证。实验表明:在加速AlexNet模型的卷积层时,峰值算力的吞吐率相比于相关研究提高了9.5%~64.3%,在映射5种经典神经网络里不同尺寸大小和步长的卷积核时,PE单元的平均利用率相比于相关研究提高了4%~11%。The traditional Convolutional Neural Network(CNN)dedicated accelerator will produce the low hardware resource utilization problem when realizing the convolution layer operator reconstruction,data multiplexing and computational resource reuse.A hardware architecture based on the combination of dynamic Register file and reconfigurable PE array is designed to balance the load of each PE unit by optimizing the data stream,thus improving the utilization of computing resources in the convolution layer.It can flexibly deploy odd convolution kernel with 0 to 11 size and 1 to 10 step length,and support multi-channel parallel convolution and input data multiplexing operations.The design is implemented using verilog hardware description language,and functional verification is carried out by creating UVM environment.The experiments show that when accelerating the convolutional layer of the AlexNet model,the throughput of peak computing power is increased by 9.5%to 64.3%compared with relevant studies.When mapping convolutional kernels of different sizes and steps in five classical neural networks,the average utilization rate of PE units is increased by 4%to 11%compared with relevant studies.
分 类 号:TN492[电子电信—微电子学与固体电子学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.248.121