用于VSLAM系统的CNN在FPGA平台上的加速  被引量:1

Acceleration of CNN for VSLAM system on FPGA platform

在线阅读下载全文

作  者:郁媛 李沛君 王光奇 张德兵[1] 张春[1] YU Yuan;LI Pei-jun;WANG Guang-qi;ZHANG De-bing;ZHANG Chun(School of Integrated Circuits,Tsinghua University,Beijing 100084,China)

机构地区:[1]清华大学集成电路学院,北京100084

出  处:《计算机工程与设计》2024年第1期71-78,共8页Computer Engineering and Design

基  金:国家自然科学基金项目(U20A20220)。

摘  要:为实现视觉同步定位与建图系统中卷积神经网络在FPGA上的加速,基于SuperPoint模型设计一种低功耗高效CNN加速器及相应的SoC系统。采用循环分块、数据复用、计算单元展开和双缓冲策略充分利用加速器的片上资源;为提高突发传输效率,预先对权重参数重排;提出Pack模块和Unpack模块,设计多通道数据传输,用于提高传输带宽。在Ultra96-V2 FPGA平台上部署整个SoC系统,在仅3 W左右的功耗下实现25.63 GOPS的吞吐量,其BRAM效率、DSP效率、性能密度和功耗效率相比之前的文献有明显优势。To realize the acceleration of convolutional neural network in visual simultaneous localization and mapping system on FPGA,a low-power and efficient CNN accelerator and its corresponding SOC system were designed based on SuperPoint model.Loop tiling,data reuse,parallel computation and double buffer strategies were adopted to make full use of the on-chip resources.To improve the burst transmission efficiency,the weight parameters were rearranged in advance.Pack module and unpack module were proposed,and multi-channel data transmission was designed to improve the data bandwidth.The whole SoC system is deployed on the Ultra96-V2 FPGA platform and a peak performance of 25.63 GOPS is achieved with only about 3 W power consumption.Its BRAM efficiency,DSP efficiency,performance density and energy efficiency have obvious advantages over previous work.

关 键 词:同步定位与建图系统 图像处理 卷积加速 数据复用 并行计算 突发传输 软硬件协作 

分 类 号:TP332[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象