检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭文旭 苏远歧[1] 刘跃虎[2] GUO Wenxu;SU Yuanqi;LIU Yuehu(Faculty of Electronic and Information Engineering,Xi’an Jiaotong University,Xi’an Shaanxi 710049,China;College of Artificial Intelligence,Xi’an Jiaotong University,Xi’an Shaanxi 710049,China)
机构地区:[1]西安交通大学电子与信息学部,西安710049 [2]西安交通大学人工智能学院,西安710049
出 处:《计算机应用》2021年第3期669-676,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61973245)。
摘 要:高精度物体检测网络急剧增加的参数和计算量使得它们很难在车辆和无人机等端侧设备上直接部署使用。针对这一问题,从网络压缩和计算加速两方面入手,提出了一种面向残差网络的新型压缩方案来实现YOLOv3的压缩,并通过ZYNQ平台对这一压缩后的网络进行加速。首先,提出了包括网络裁剪和网络量化两方面的网络压缩算法。网络裁剪方面,给出了针对残差结构的裁剪策略来将网络剪枝分为通道剪枝和残差链剪枝两个粒度,解决了通道剪枝无法应对残差连接的局限性,进一步降低了模型的参数量;网络量化方面,实现了一种基于相对熵的模拟量化方法,以通道为单位对参数进行量化,在线统计模型的参数分布与参数量化造成的信息损失,从而辅助选择最优量化策略来减少量化过程的精度损失。然后,在ZYNQ平台上设计并改进了8比特的卷积加速模块,从而优化了片上缓存结构并结合Winograd算法实现了压缩后YOLOv3的加速。实验结果表明,所提压缩算法较YOLOv3 tiny能够进一步降低模型尺寸,但检测精度提升了7个百分点;同时ZYNQ平台上的硬件加速方法获得了比其他平台更高的能耗比,从而推进了YOLOv3以及其他残差网络在ZYNQ端侧的实际部署。The object detection networks with high accuracy are hard to be directly deployed on end-devices such as vehicles and drones due to their significant increase of parameters and computational cost.In order to solve the problem,by considering network compression and computation acceleration,a new compression scheme for residual networks was proposed to compress YOLOv3(You Only Look Once v3),and this compressed network was then accelerated on ZYNQ platform.Firstly,a network compression algorithm containing both network pruning and network quantization was proposed.In the aspect of network pruning,a strategy for residual structure was introduced to divide the network pruning into two granularities:channel pruning and residual connection pruning,which overcame the limitations of the channel pruning on residual connections and further reduced the parameter number of the model.In the aspect of network quantization,a relative entropy-based simulated quantization was utilized to quantize the parameters channel by channel,and perform the online statistics of the parameter distribution and the information loss caused by the parameter quantization,so as to assist to choose the best quantization strategy to reduce the precision loss during the quantization process.Secondly,the 8-bit convolution acceleration module was designed and optimized on ZYNQ platform,which optimized the on-chip cache structure and accelerate the compressed YOLOv3 with combining the Winograd algorithm.Experimental results show that the proposed solution can achieve smaller model scale and higher accuracy(7 percent points increased)compared to YOLOv3 tiny.Meanwhile,the hardware acceleration method on ZYNQ platform achieves higher energy efficiency ratio than other platforms,thus helping the actual deployment of YOLOv3 and other residual networks on the end sides of ZYNQ.
关 键 词:物体检测 神经网络压缩 计算加速 网络剪枝 网络量化 ZYNQ平台
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30