基于样本类不确定性抽样的端到端语音关键词检测训练方法

End-to-End Speech Keyword Spotting Training Method Based on Sample's Class Uncertainty

作　　者：贺前华[1] 陈永强郑若伟黄金鑫 HE Qian-hua;CHEN Yong-qiang;ZHENG Ruo-wei;HUANG Jin-xin(School of Electronic and Information Engineering,South China University of Technology,Guangzhou,Guangdong 510641,China)

机构地区：[1]华南理工大学电子与信息学院,广东广州510641

出　　处：《电子学报》2024年第10期3482-3492,共11页Acta Electronica Sinica

基　　金：广东省科技计划项目(No.2023A0505050116,No.2022A1515011687);国家自然科学基金(No.62371195)。

摘　　要：当前语音关键词检测主流技术为端到端的深度学习方法,研究重点为网络结构优化、建模单元选取及搜索策略等,并取得较快进展,但对模型训练效率的关注相对较少.本文针对深度学习模型训练效率问题,提出了一种样本类不确定性抽样(Class Uncertainty Sampling,CUS)的样本应用策略加速收敛进程.其核心思想是在模型训练中后期,利用网络的前向输出层对样本评价信息进行样本类不确定性度量,并转化成样本选用概率,随机抽取训练样本子集用于后续训练.由于简单样本的类确定度高,它们参与后续训练的概率降低,但不影响模型的区分能力,增强对判决边界样本的关注,达到提高模型训练效率的目标.基于AISHELL-1普通话数据集的实验结果表明,相对常规训练策略,平均训练时长缩短60%,收敛时长缩短47.5%.虚警率(False Alarm Rate,FAR)为0.5 FP/h时,该方法的错误拒绝率(False Reject Rate,FRR)从4.75%降至3.65%,相对下降30.1%,最大关键词加权值(Maximum Term Weighted Value,MTWV)由0.8374升至0.8531.通过分析错标样本参与训练的行为,证实了该方法具有屏蔽掉大部分错误标注样本的能力,减少错标样本对训练的损害.基于大规模AISHELL-2普通话数据集的实验进一步证实了提出方法的有效性.End-to-end deep learning is the main technology for speech keyword spotting.The research focused on exploring better network structures,modeling units,and search strategies,and have made a lot of progress.However,less attention is paid on training efficiency.In this paper,a novel class uncertainty sampling(CUS)strategy is proposed to select effective samples for each training epoch.Since only a subset is used,much training time is saved.The core idea of CUS is measuring the class uncertainty of samples with the forward information of the output layer during the middle and late training stages,and samples are selected at a probability of their class uncertainty.Therefore more attention is paid to samples nearing the decision boundary,which are prone to missed detection or false alarm.Furthermore,the proposed method could shield the interference of label error samples.Experimental results on the AISHELL-1 Mandarin dataset showed that fast convergence and better training performance were achieved.Against the conventional training strategy,the average training time and the average converging time was relatively shortened by 60%and 47.5%,respectively.At 0.5 FP/h false accept rate(FAR),the false reject rate(FRR)was reduced from 4.75%to 3.65%,a relative reduction of 30.1%,and the maximum term weighted value(MTWV)was increased from 0.8374 to 0.8531.Moreover,it was experimentally verified that the method could shield most of the mislabeled samples.This conclusion was confirmed with the experiments on the large-scale AISHELL-2 Mandarin dataset.

关键词：检测深度学习端到端类不确定性抽样

分类号：TN912[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于样本类不确定性抽样的端到端语音关键词检测训练方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于样本类不确定性抽样的端到端语音关键词检测训练方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索