一种嵌入式轻量化卷积神经网络计算加速方法  被引量:1

Embedded Lightweight Convolutional Neural Network Computing Acceleration Method

在线阅读下载全文

作  者:谢媛媛 刘一睿 陈迟晓[3] 康晓洋 张立华 XIE Yuan-yuan;LIU Yi-rui;CHEN Chi-xiao;KANG Xiao-yang;ZHANG Li-hua(Academy for Engineering and Technology,Fudan University,Shanghai 200433,China;School of Information Science and Technology,Fudan University,Shanghai 200433,China;Frontier Institute of Chip and System,Fudan University,Shanghai 200433,China)

机构地区:[1]复旦大学工程与应用技术研究院,上海200433 [2]复旦大学信息科学与工程学院,上海200433 [3]复旦大学芯片与系统前沿技术研究院,上海200433

出  处:《小型微型计算机系统》2023年第7期1345-1351,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金面上项目(61974033)资助;国家自然科学基金青年科学基金项目(61904038)资助;国家重点研发计划项目(2021YFC0122702)资助;上海市青年科技英才扬帆计划项目(19YF1403600)资助;上海市“科技创新行动计划”生物医药领域科技支撑项目(19441907600)资助;季华实验室项目(X190021TB190,X190021TB193)资助;上海市科技成果转化和产业化项目(19511132000)资助;上海市市级科技重大专项项目(2021SHZDZX0103)资助。

摘  要:针对传统ARM处理器算力低、不适用于实时性需求比较高的应用场景的问题,本文提出了一种基于ARM处理器的单指令多数据(Single Instruction Multiple Data,SIMD)指令集的轻量化卷积神经网络计算加速方法,并将该方法用于处理脑电信号(Electroencephalogram,EEG)来进行手术过程中麻醉深度监测.通过可学习步长量化的方法得到轻量化卷积神经网络,减少浮点数的运算量,极大地提高了网络速度.采用基于ARM处理器SIMD指令集的卷积加速器,各卷积层分别可加速几十倍、几百倍,甚至一万多倍.在Ultra 96-V2开发板上用ARM处理器实现整个网络的运算,在昆士兰大学生命体征公开数据集上的测试结果表明,仅需39.64ms就可以处理时间跨度为1s的EEG单通道信号,速度提高到原来的10.5倍,且功耗仅为0.1J,在提升速度的同时基本保持网络预测的准确率,能够很好地预测出麻醉深度.A lightweight convolutional neural network computing acceleration method based on the ARM processor SIMD(Single Instruction Multiple Data)instruction set is proposed to solve the problem that traditional ARM processors have low computing power and are not suitable for application scenarios with high real-time requirements.And this method is used to process EEG(Electroencephalogram)signals to monitor the depth of anesthesia during surgery.The lightweight convolutional neural network is obtained by the method of learned step quantization,which reduces the amount of floating-point calculations and greatly improves the network speed.Using the convolution accelerator based on the SIMD instruction set of the ARM processor,each convolutional layer can be accelerated by dozens of times,hundreds of times,or even more than 10000 times respectively.Use ARM processor on Ultra 96-V2 board to implement the entire network,the test results on the University of Queensland Vital Signs Public Dataset show that it only takes 39.64ms to process EEG single-channel signals with a time span of one second,speeding up to 10.5 times,and the power consumption is only 0.1J,maintaining the accuracy while increasing the speed,predicting the depth of anesthesia well.

关 键 词:网络轻量化 可学习步长量化 单指令多数据 数据流架构 脑电信号 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象