检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄至锐 贾心茹 朱浩哲 陈迟晓[1,2] HUANG Zhi-rui;JIA Xin-ru;ZHU Hao-zhe;CHEN Chi-xiao(State Key Laboratory of Integrated Chips and Systems,Fudan University,Shanghai 200433;Frontier Institute of Chip and System,Fudan University,Shanghai 200438,China)
机构地区:[1]复旦大学集成芯片与系统全国重点实验室,上海200433 [2]复旦大学芯片与系统前沿技术研究院,上海200438
出 处:《计算机工程与科学》2024年第8期1331-1339,共9页Computer Engineering & Science
基 金:国家重点研发计划(2022YFB4500101)。
摘 要:为了解决关键词唤醒算法部署在边缘计算硬件会带来较高功耗、给电池驱动的设备带来续航挑战的问题,提出了一种基于存内计算技术和软硬件协同优化的低功耗关键词唤醒系统。在算法层面,基于标准MFCC算法拓扑结构提出了一种三值量化MFCC-CNN联合算法,将MFCC中的全部通用矩阵乘映射到神经网络加速器当中。在电路层面,提出了一种基于SRAM的存内计算核心,用于解决传统冯·诺依曼架构加速器存在的功耗墙和存储墙问题。同时通过复用存内计算核心的SRAM存储功能提出了一种基于查找表实现的缓存电路,用于替代寄存器延迟链电路。SRAM存内计算核心和SRAM缓存电路均采用定制单元实现。在系统层面,基于以上2种定制电路设计了一种低功耗关键词唤醒系统。该系统采用ASIC与定制化电路设计流程设计,并使用28 nm CMOS工艺库对该设计进行了ASIC综合,在250 kHz下,关键词唤醒系统运行10分类任务的延迟是64 ms,整体功耗为645.28μW,其中MFCC流水线的动态功耗占总动态功耗的5.9%,总功耗仅占系统功耗的1.3%。This paper proposes a low-power keyword spotting(KWS)system to overcome the problem of high-power consumption caused by deploying KWS algorithms on edge computing hardware,which can significantly impact the endurance of mobile devices.The proposed KWS system is based on computing-in-memory(CIM)technology and software-hardware co-design.In terms of algorithm,a ternary quantized MFCC-CNN joint algorithm based on the standard MFCC algorithm topology is proposed.All the general matrix multiplication(GEMM)in MFCC is mapped to the neural network accelerator.At the circuit level,the proposed system uses a computing-in-memory(CIM)core based on SRAM to overcome the power and memory walls in traditional von Neumann architecture accelerators.Additionally,a SRAM buffer circuit based on a look-up table is proposed to replace the register delay chain,which multiplexes the memory array in the CIM core.Both the SRAM-based CIM core and buffer are implemented using custom circuit units.At the system level,a low-power KWS system is proposed utilizing the two customized circuits discussed above.The system is implemented using ASIC and customized circuit design methods and synthesized using a 28 nm process library.The proposed system achieves a processing delay of 64 ms on 10 classification tasks,with a total power consumption of 645.28μW.The dynamic power consumption of the MFCC pipeline accounts for 5.9%of the total dynamic power consumption,and the total power consumption accounts for only 1.3%of the system's power consumption.
关 键 词:关键词唤醒 三值量化神经网络 存内计算 串行快速傅里叶变换 软硬件协同设计
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38