Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration  

Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration

在线阅读下载全文

作  者:Zheng Wang Libing Zhou Wenting Xie Weiguang Chen Jinyuan Su Wenxuan Chen Anhua Du Shanliao Li Minglan Liang Yuejin Lin Wei Zhao Yanze Wu Tianfu Sun Wenqi Fang Zhibin Yu 

机构地区:[1]Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China [2]School of Microelectronics,Xidian University,Xi'an710071,China [3]School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China [4]Changzhou Campus of Hohai University,Changzhou 213022,China

出  处:《Journal of Semiconductors》2020年第2期29-41,共13页半导体学报(英文版)

基  金:supported by NSFC with Grant No. 61702493, 51707191;Science and Technology Planning Project of Guangdong Province with Grant No. 2018B030338001;Shenzhen S&T Funding with Grant No. KQJSCX20170731163915914;Basic Research Program No. JCYJ20170818164527303, JCYJ20180507182619669;SIAT Innovation Program for Excellent Young Researchers with Grant No. 2017001

摘  要:Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2.Driven by continuous scaling of nanoscale semiconductor technologies, the past years have witnessed the progressive advancement of machine learning techniques and applications. Recently, dedicated machine learning accelerators, especially for neural networks, have attracted the research interests of computer architects and VLSI designers. State-of-the-art accelerators increase performance by deploying a huge amount of processing elements, however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels. In this work, we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor, which adjusts the patterns of data flowing, functionalities of processing elements and on-chip storages according to network kernels. In contrast to stateof-the-art fine-grained data flowing techniques, the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources. Three hybrid networks for MobileNet, deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain. A test chip has been designed and fabricated under UMC 65 nm CMOS technology, with the measured power consumption of 7.51 mW under100 MHz frequency on a die size of 1.8 × 1.8 mm^2.

关 键 词:CMOS technology digital integrated circuits neural networks dataflow architecture 

分 类 号:TP332[自动化与计算机技术—计算机系统结构] TP183[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象