面向图像识别的深度学习VLIW处理器设计  被引量:2

Design of Deep Learning VLIW Processor for Image Recognition

在线阅读下载全文

作  者:李林 张盛兵[1] 吴鹃[3] LI Lin;ZHANG Shengbing;WU Juan(School of Computer Science and Engineering,Northwestern Polytechnical University,Xi′an 710072,China;Fourth Design Department,Beijing Institute of Micoelectronics Technology,Beijing 100076,China;School of Animation and Software,Xi′an Vocational and Technical College,Xi′an 710077,China)

机构地区:[1]西北工业大学计算机学院,陕西西安710072 [2]北京微电子技术研究所设计四部,北京100076 [3]西安职业技术学院动漫软件学院,陕西西安710077

出  处:《西北工业大学学报》2020年第1期216-224,共9页Journal of Northwestern Polytechnical University

摘  要:为了适应航空航天领域高分辨率图像识别和本地化高效处理的需求,解决现有研究中计算并行性不足的问题,在对深度卷积神经网络模型各层计算优化的基础上,设计了一款可扩展的多处理器簇的深度学习超长指令字(VLIW)处理器体系结构。设计中采用了特征图和神经元的并行处理,基于VLIW的指令级并行,多处理器簇的数据级并行以及流水线技术。FPGA原型系统测试结果表明,该处理器可有效完成图像分类和目标检测应用;当工作频率为200 MHz时,处理器的峰值性能可以达到128 GOP/s;针对选取的测试基准,该处理器的计算速度至少是CPU的12倍,是GPU的7倍;对比软件框架运行结果,处理器的测试精度的平均误差不超过1%。In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields,and to solve the problem of insufficient parallelism in existing researches,an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model.Parallel processing of feature maps and neurons,instruction level parallelism based on very long instruction word(VLIW),data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design.The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications.The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz.For selecting benchmarks,the processor speed is about 12X faster than CPU and 7X faster than GPU at least.Comparing with the results of the software framework,the average error of the test accuracy of the processor is less than 1%.

关 键 词:图像识别 深度学习 卷积神经网络 超长指令字(VLIW) 处理器 可扩展 

分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象