机构地区:[1]西安电子科技大学计算机科学与技术学院,西安710000 [2]中国空间技术研究院西安分院,西安710000
出 处:《计算机学报》2022年第10期2047-2064,共18页Chinese Journal of Computers
基 金:国家自然科学基金(62171342,61850410523);空间测控通信创新探索基金(201701B)资助.
摘 要:卷积神经网络(Convolutional Neural Network,CNN)是目前主流视觉算法不可或缺的关键部分.为提高CNN模型推理速度,学界提出了众多异构加速方法以满足不同场景下的多元加速需求.但如何在资源与能耗受限的在轨卫星上稳定高效地加速CNN仍是极具挑战的课题.为此,本文通过软硬件协同设计,着力优化微指令编码、指令级并行和运算级并行3个加速器设计的关键部分,在星上常见的Xilinx VX690T FPGA芯片上设计实现了一种微指令序列调度数据流的CNN加速器.在软件层面,本文提出一种可扩展的微指令编码格式及相应的编译方法.通过卷积循环分块和算子融合策略实现图级别优化,生成加速器可执行的微指令序列.在硬件层面,本文设计实现了一个由微控制器与逻辑运算器组成的RTL级CNN加速器.微控制器通过粗粒度流水线实现各类指令的并行执行.逻辑运算器通过DSP48E1计算资源级联所构建的计算阵列实现卷积算子的细粒度并行运算.实验结果表明,加速器设计功耗10.68 W,在加速YOLOV3Tiny算法时,峰值吞吐率(Runtime Max Throughput,RMT)达到378.63 GOP/s,计算资源利用效率(MAC Efficiency,ME)达到91.5%.相较典型GPU加速方法,本文的加速器有14倍能效提升.相较同类FPGA加速器,ME有6.9%以上的提升.Recently,with the evolvement of space remote sensing technology,the main earth observation device has been gradually transitioning from the single-satellite to a constellation composed of light and small satellites.A constellation of several high-resolution satellites collects hundreds of TBs(Terabytes)of RSI(Remote Sensing Image)data every day.The traditional satellite-to-ground data transmission mechanism has been unable to match the massive remote sensing data processing.In-orbit satellites need to improve their data processing capabilities to deal with increasingly complex observation missions.Meanwhile,in the field of RSI processing,deep learning algorithms based on CNN(Convolutional Neural Network)have become the mainstream method due to their excellent performance.However,the computation-intensive and memory-intensive features have brought many challenges to the deployment of CNN.Academia and industry propose many specific acceleration methods for the CNN domain to cope with the various application scenarios.Numerous FPGA(Field Programmable Gate Array)and ASIC(Application Specific Integrated Circuit)accelerators have been designed to accelerate CNN in edge and data center scenarios.Compared with ASIC,FPGA has higher flexibility and faster development iteration speed,making it very suitable for spaceborne scenarios.In this paper,we propose a microinstruction driven CNN Accelerator for RSI processing on FPGA.This accelerator is jointly designed by software and hardware,which mainly optimizes microinstruction coding,instruction-level parallelism(Coarse-Grained Parallelism)and operation-level parallelism(Fine-Grained Parallelism)under the constraints of limited storage bandwidth and computing resources on satellites.At software level,we propose an extensible microinstruction encoding format and the corresponding compilation method(Micro Assembler).A microinstruction code covers 14 instructions in 4 types,which can schedule the dataflow between different components of the accelerator.The micro assembler perform
关 键 词:卷积神经网络 微指令序列 现场可编程逻辑门阵列 遥感目标检测 微处理器设计
分 类 号:TP303[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...