基于端到端建模的低资源连续语音关键词识别系统  

在线阅读下载全文

作  者:陈芒 

机构地区:[1]深圳市轻生活科技有限公司,广东深圳518045

出  处:《现代传输》2023年第4期60-66,共7页Modern Transmission

摘  要:语音关键词识别具有广阔的市场应用需求。在嵌入式领域,由于嵌入式设备资源有限,应用场景复杂多变,对语音关键词识别系统提出了资源占用少,低功耗,响应快,系统鲁棒性好等更高要求。本文设计实现的低资源连续语音关键词识别系统基于端到端声学建模,采用知识蒸馏、模型量化、模型剪枝的方法将模型占用资源压缩到了36.8K字节,系统运行资源占用约133K字节。本文提出的连续语音关键词解码算法相比于传统的孤立词解码算法,噪声环境下的召回绝对提升6.88%。系统在主频120M,内存256K字节的BK3288低功耗SOC平台上进行20个关键词的识别测试,达到安静环境下召回率96.86%,噪声环境召回率74.81%,虚警0.2次/小时的识别性能。Speech keyword spotting has a wide range for applications. In the embedded field, due to the limited resources of embedded devices and thecomplex and changeable application scenarios, higher requirements are proposed for speech keyword spotting systems, such as low resourceconsumption, low power consumption, fast response and high robustness. The low-resource continuous speech keyword spotting systemdesigned and implemented in this paper is based on end-to-end acoustic modeling. The model is compressed to 36.8K bytes through knowledgedistillation, model quantization, and model pruning. The system requires approximately 133K in runtime. Compared with the traditional isolatedword decoding algorithm, the recall of continuous speech keywords proposed in this paper has an absolute increase of 6.88% in noisy conditions.The system is tested on the BK3288 low-power SOC with a main frequency of 120M and memory of 256K bytes with 20 keyword entries. Therecognition performance achieves a recall rate of 96.86% in a clean condition, a recall rate of 74.81% in noisy conditions, and a false alarm rateof 0.2 times per hour.

关 键 词:低资源 语音关键词识别 模型压缩 令牌传递 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象