检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:籍浩林 徐伟[1,3] 朴永杰[1,3] 吴晓斌 高倓 JI Haolin;XU Wei;PIAO Yongjie;WU Xiaobin;GAO Tan(Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China;University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of Space-Based Dynamic&Rapid Optical Imaging Technology,Chinese Academy of Sciences,Changchun 130033,China)
机构地区:[1]中国科学院长春光学精密机械与物理研究所,吉林长春130033 [2]中国科学院大学,北京100049 [3]中国科学院天基动态快速光学成像技术重点实验室,吉林长春130033
出 处:《液晶与显示》2025年第3期448-456,共9页Chinese Journal of Liquid Crystals and Displays
基 金:钱学森空间技术实验室创新工作站开发基金(No.GZZKFJJ2020003)。
摘 要:受硬件平台算力以及存储资源的限制,利用嵌入式系统实现节能且高效的卷积神经网络(CNN)仍然是硬件设计人员面临的主要挑战。基于此,本文提出一种使用现场可编程门阵列片上系统(SoC)实现的异构嵌入式系统的完整设计。该设计采用了一种可级联的输入复用结构,同时在单个DSP中执行两个独立的乘法累加操作,在减少外部存储器的访问、提升系统效率的同时降低了功耗,相较于其他方案,其功率效率提升38.7%以上。该设计(框架)最终被成功部署于低成本设备上的大规模CNN网络,极大提升了网络模型的功率效率,基于ZYNQ XC7Z045设备上实现的功率效率甚至可达102 Gops/W。此外,当利用该框架进行VGG-16模型推断卷积层时,帧率可达10.9 fps,充分表明该设计在功率受限的环境中可以有效加速卷积神经网络的推理。Due to limitations in hardware platform computing power and storage resources,achieving energy-efficient and efficient convolutional neural networks(CNNs)by using embedded systems remains a primary challenge for hardware designers.In this context,a complete design of a heterogeneous embedded system implemented by using a system-on-chip(SoC)with a field-programmable gate array(FPGA)is proposed.This design adopts a cascaded input multiplexing structure,enabling two independent multiply-accumulate operations in a single DSP,reducing external memory access,enhancing system efficiency,and lowering power consumption.Compared to other designs,the power efficiency is improved by over 38.7%.The design framework is successfully deployed in a large-scale CNN network on low-cost devices,significantly improving power efficiency of the network model.The power efficiency achieved on the ZYNQ XC7Z045 device can even reach 102 Gops/W.Furthermore,when inferring the VGG-16’s CONV layers by using this framework,a frame rate of up to 10.9 fps is achieved,which demonstrates the framework’s effective acceleration of CNN inference in power-constrained environments.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.242.144