检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谢志豪 李国刚[1,2] XIE Zhihao;LI Guogang(School of Information and Engineering,Huaqiao University,Xiamen 361021,China;Xiamen Key Laboratory of ASIC and Power Semiconductor System,Huaqiao University,Xiamen 361021,China)
机构地区:[1]华侨大学信息科学与工程学院,福建厦门361021 [2]华侨大学厦门市专用集成电路与功率半导体系统重点实验室,福建厦门361021
出 处:《华侨大学学报(自然科学版)》2025年第2期209-216,共8页Journal of Huaqiao University(Natural Science)
基 金:国家自然科学基金资助项目(61370007);福建省高校产学合作项目(2023H6013)。
摘 要:为解决卷积神经网络(CNN)高效部署的挑战,提出一种基于软硬件协同设计的异构CNN加速器,并在YOLOv4 tiny模型上进行验证。搭建基于高级精简指令集机器(ARM)处理器与现场可编程门阵列(FGPA)的异构系统。通过高层次综合(HLS)将可并行执行的计算单元映射为FPGA端寄存器传输级(RTL)知识产权(IP);ARM处理器控制系统的协同工作与IP核的调度,最终实现前向推理加速。结果表明:该异构CNN加速器的工作频率为130 MHz,功耗为2.809 W,推理速度达到511 ms,吞吐率为13.40 GOPS;相较于桌面端图形处理单元(GPU)、中央处理单元(CPU)及主流嵌入式AI加速平台,该设计在推理速度与功耗之间取得了良好平衡,同时关键性能指标均有显著提升;所设计异构CNN加速器在边缘计算场景中表现出优异性能,能够满足实际部署需求。To address the challenges associated with the efficient deployment of convolutional neural network(CNN),a heterogeneous CNN accelerator based on a hardware-software co-design is proposed and validated on the YOLOv4-tiny model.The heterogeneous system is built with an advanced reduced instruction set machine(ARM)processors and a field programmable gate array(FPGA).Through high-level synthesis(HLS),the computational units that can be executed in parallel are mapped to a register transfer level(RTL)intellectual property(IP)on FPGA.The ARM processors manage the collaborative operations of the system and the scheduling of the IP core,ultimately achieving acceleration of forward inference.The results show that the heterogeneous CNN accelerator operates at a frequency of 130 MHz,with a power consumption of 2.809 W and an inference speed of ms,achieving a throughput of 13.40 GOPS.Compared to desktop graphics processing unit(GPU),central processing unit(CPU)and mainstream embedded AI acceleration platforms,the proposed design achieves a favorable balance between inference speed and power consumption,while significantly improving key performance indicators.The designed heterogeneous CNN accelerator demonstrates excellent performance in edge computing scenarios and meets the requirements for practical deployment.
关 键 词:现场可编程门阵列(FGPA) 硬件加速 软硬件协同设计 高层次综合
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15