检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郁媛 李沛君 王光奇 张德兵[1] 张春[1] YU Yuan;LI Pei-jun;WANG Guang-qi;ZHANG De-bing;ZHANG Chun(School of Integrated Circuits,Tsinghua University,Beijing 100084,China)
出 处:《计算机工程与设计》2024年第1期71-78,共8页Computer Engineering and Design
基 金:国家自然科学基金项目(U20A20220)。
摘 要:为实现视觉同步定位与建图系统中卷积神经网络在FPGA上的加速,基于SuperPoint模型设计一种低功耗高效CNN加速器及相应的SoC系统。采用循环分块、数据复用、计算单元展开和双缓冲策略充分利用加速器的片上资源;为提高突发传输效率,预先对权重参数重排;提出Pack模块和Unpack模块,设计多通道数据传输,用于提高传输带宽。在Ultra96-V2 FPGA平台上部署整个SoC系统,在仅3 W左右的功耗下实现25.63 GOPS的吞吐量,其BRAM效率、DSP效率、性能密度和功耗效率相比之前的文献有明显优势。To realize the acceleration of convolutional neural network in visual simultaneous localization and mapping system on FPGA,a low-power and efficient CNN accelerator and its corresponding SOC system were designed based on SuperPoint model.Loop tiling,data reuse,parallel computation and double buffer strategies were adopted to make full use of the on-chip resources.To improve the burst transmission efficiency,the weight parameters were rearranged in advance.Pack module and unpack module were proposed,and multi-channel data transmission was designed to improve the data bandwidth.The whole SoC system is deployed on the Ultra96-V2 FPGA platform and a peak performance of 25.63 GOPS is achieved with only about 3 W power consumption.Its BRAM efficiency,DSP efficiency,performance density and energy efficiency have obvious advantages over previous work.
关 键 词:同步定位与建图系统 图像处理 卷积加速 数据复用 并行计算 突发传输 软硬件协作
分 类 号:TP332[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33