检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:柏财通 崔翛龙 郑会吉 李爱 BAI Caitong;CUI Xiaolong;ZHENG Huiji;LI Ai(Postgraduate Brigade,Engineering University of PAP,Xi’an Shaanxi 710086,China;Counter‑Terrorism Command Information Engineering Research Team,Engineering University of PAP,Xi’an Shaanxi 710086,China;Urumqi Campus of Engineering University of PAP,Urumqi Xinjiang 830049,China)
机构地区:[1]武警工程大学研究生大队,西安710086 [2]武警工程大学反恐指挥信息工程研究团队,西安710086 [3]武警工程大学乌鲁木齐校区,乌鲁木齐830049
出 处:《计算机应用》2022年第10期3217-3223,共7页journal of Computer Applications
基 金:国家自然科学基金资助项目(U1603261);网信融合项目(LXJH-10(A)-09)。
摘 要:针对标注神经网络训练数据的成本日益增加与噪声干扰阻碍语音识别系统性能提升的问题,提出一种基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法。首先,在预处理阶段提取原始语音样本的三个人工特征;然后,在训练阶段将特征提取网络生成的高级特征分别通过三个浅层网络来拟合预处理阶段提取的人工特征;同时,把特征提取前端与语音识别后端进行交叉训练,并合并它们的损失函数;最后,通过梯度反向传播令特征提取网络学会提取更有助于去噪语音识别的高级特征,从而实现人工知识迁移与去噪,并高效利用了训练数据。在军事装备控制的应用场景下,基于加噪后的THCHS-30、希尔贝壳数据集AISHELL-1与ST-CMDS这三个开源中文语音识别数据集以及军事装备控制指令的数据集上进行测试,实验结果表明,基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法词错率可以降低到0.12,不仅可以实现对鲁棒性语音识别模型的模型训练,同时通过自监督知识迁移提高了训练样本的利用率,可完成装备控制任务。A robust speech recognition model training algorithm based on self-supervised knowledge transfer was proposed to solve the problems of the increasingly high cost of tagging neural network training data and noise interference hindering performance improvement of speech recognition system. Firstly, three artificial features of the original speech samples were extracted in the pre-processing stage. Then, the advanced features generated by the feature extraction network were fitted to the artificial features extracted in the pre-processing stage through three shallow networks respectively in the training stage. At the same time, the feature extraction front-end and the speech recognition back-end were cross-trained, and their loss functions were integrated. Finally, the advanced features that are more conducive to denoised speech recognition were extracted by the feature extraction network after using the gradient back propagation, thereby realizing the artificial knowledge transfer and denoising as well as using training data efficiently. In the application scenario of military equipment control, the word error rate of the proposed method can be reduced to 0. 12 based on the test on three open source Chinese speech recognition datasets THCHS-30(TsingHua Continuous Chinese Speech), Aishell-1 and ST-CMDS(Surfing Technology Commands) as well as the military equipment control command dataset. Experimental results show that the proposed method can not only train robust speech recognition models, but also improve the utilization rate of training samples through self-supervised knowledge transfer, and can complete equipment control tasks.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.15.7