检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贺前华[1] 陈永强 郑若伟 黄金鑫 HE Qian-hua;CHEN Yong-qiang;ZHENG Ruo-wei;HUANG Jin-xin(School of Electronic and Information Engineering,South China University of Technology,Guangzhou,Guangdong 510641,China)
机构地区:[1]华南理工大学电子与信息学院,广东广州510641
出 处:《电子学报》2024年第10期3482-3492,共11页Acta Electronica Sinica
基 金:广东省科技计划项目(No.2023A0505050116,No.2022A1515011687);国家自然科学基金(No.62371195)。
摘 要:当前语音关键词检测主流技术为端到端的深度学习方法,研究重点为网络结构优化、建模单元选取及搜索策略等,并取得较快进展,但对模型训练效率的关注相对较少.本文针对深度学习模型训练效率问题,提出了一种样本类不确定性抽样(Class Uncertainty Sampling,CUS)的样本应用策略加速收敛进程.其核心思想是在模型训练中后期,利用网络的前向输出层对样本评价信息进行样本类不确定性度量,并转化成样本选用概率,随机抽取训练样本子集用于后续训练.由于简单样本的类确定度高,它们参与后续训练的概率降低,但不影响模型的区分能力,增强对判决边界样本的关注,达到提高模型训练效率的目标.基于AISHELL-1普通话数据集的实验结果表明,相对常规训练策略,平均训练时长缩短60%,收敛时长缩短47.5%.虚警率(False Alarm Rate,FAR)为0.5 FP/h时,该方法的错误拒绝率(False Reject Rate,FRR)从4.75%降至3.65%,相对下降30.1%,最大关键词加权值(Maximum Term Weighted Value,MTWV)由0.8374升至0.8531.通过分析错标样本参与训练的行为,证实了该方法具有屏蔽掉大部分错误标注样本的能力,减少错标样本对训练的损害.基于大规模AISHELL-2普通话数据集的实验进一步证实了提出方法的有效性.End-to-end deep learning is the main technology for speech keyword spotting.The research focused on exploring better network structures,modeling units,and search strategies,and have made a lot of progress.However,less attention is paid on training efficiency.In this paper,a novel class uncertainty sampling(CUS)strategy is proposed to select effective samples for each training epoch.Since only a subset is used,much training time is saved.The core idea of CUS is measuring the class uncertainty of samples with the forward information of the output layer during the middle and late training stages,and samples are selected at a probability of their class uncertainty.Therefore more attention is paid to samples nearing the decision boundary,which are prone to missed detection or false alarm.Furthermore,the proposed method could shield the interference of label error samples.Experimental results on the AISHELL-1 Mandarin dataset showed that fast convergence and better training performance were achieved.Against the conventional training strategy,the average training time and the average converging time was relatively shortened by 60%and 47.5%,respectively.At 0.5 FP/h false accept rate(FAR),the false reject rate(FRR)was reduced from 4.75%to 3.65%,a relative reduction of 30.1%,and the maximum term weighted value(MTWV)was increased from 0.8374 to 0.8531.Moreover,it was experimentally verified that the method could shield most of the mislabeled samples.This conclusion was confirmed with the experiments on the large-scale AISHELL-2 Mandarin dataset.
分 类 号:TN912[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7