基于自监督知识迁移的鲁棒性语音识别技术  被引量:2

Robust speech recognition technology based on self-supervised knowledge transfer

在线阅读下载全文

作  者:柏财通 崔翛龙 郑会吉 李爱 BAI Caitong;CUI Xiaolong;ZHENG Huiji;LI Ai(Postgraduate Brigade,Engineering University of PAP,Xi’an Shaanxi 710086,China;Counter‑Terrorism Command Information Engineering Research Team,Engineering University of PAP,Xi’an Shaanxi 710086,China;Urumqi Campus of Engineering University of PAP,Urumqi Xinjiang 830049,China)

机构地区:[1]武警工程大学研究生大队,西安710086 [2]武警工程大学反恐指挥信息工程研究团队,西安710086 [3]武警工程大学乌鲁木齐校区,乌鲁木齐830049

出  处:《计算机应用》2022年第10期3217-3223,共7页journal of Computer Applications

基  金:国家自然科学基金资助项目(U1603261);网信融合项目(LXJH-10(A)-09)。

摘  要:针对标注神经网络训练数据的成本日益增加与噪声干扰阻碍语音识别系统性能提升的问题,提出一种基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法。首先,在预处理阶段提取原始语音样本的三个人工特征;然后,在训练阶段将特征提取网络生成的高级特征分别通过三个浅层网络来拟合预处理阶段提取的人工特征;同时,把特征提取前端与语音识别后端进行交叉训练,并合并它们的损失函数;最后,通过梯度反向传播令特征提取网络学会提取更有助于去噪语音识别的高级特征,从而实现人工知识迁移与去噪,并高效利用了训练数据。在军事装备控制的应用场景下,基于加噪后的THCHS-30、希尔贝壳数据集AISHELL-1与ST-CMDS这三个开源中文语音识别数据集以及军事装备控制指令的数据集上进行测试,实验结果表明,基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法词错率可以降低到0.12,不仅可以实现对鲁棒性语音识别模型的模型训练,同时通过自监督知识迁移提高了训练样本的利用率,可完成装备控制任务。A robust speech recognition model training algorithm based on self-supervised knowledge transfer was proposed to solve the problems of the increasingly high cost of tagging neural network training data and noise interference hindering performance improvement of speech recognition system. Firstly, three artificial features of the original speech samples were extracted in the pre-processing stage. Then, the advanced features generated by the feature extraction network were fitted to the artificial features extracted in the pre-processing stage through three shallow networks respectively in the training stage. At the same time, the feature extraction front-end and the speech recognition back-end were cross-trained, and their loss functions were integrated. Finally, the advanced features that are more conducive to denoised speech recognition were extracted by the feature extraction network after using the gradient back propagation, thereby realizing the artificial knowledge transfer and denoising as well as using training data efficiently. In the application scenario of military equipment control, the word error rate of the proposed method can be reduced to 0. 12 based on the test on three open source Chinese speech recognition datasets THCHS-30(TsingHua Continuous Chinese Speech), Aishell-1 and ST-CMDS(Surfing Technology Commands) as well as the military equipment control command dataset. Experimental results show that the proposed method can not only train robust speech recognition models, but also improve the utilization rate of training samples through self-supervised knowledge transfer, and can complete equipment control tasks.

关 键 词:知识迁移 鲁棒性语音识别 自监督学习 中文语音识别 语音去噪 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP309[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象