用于大词汇量语音识别的门控残差DFSMN声波模型  被引量:3

Gated residual DFSMN acoustic models for large vocabulary speech recognition

在线阅读下载全文

作  者:霍伟明 徐浩 HUO Weiming;XU Hao(GD Midea Air-Conditioning Equipment Co.,Ltd.,Foshan 528311)

机构地区:[1]广东美的制冷设备有限公司,广东佛山528311

出  处:《家电科技》2022年第5期22-25,共4页Journal of Appliance Science & Technology

摘  要:深度前馈序列记忆网络(DFSMN,Deep Feedforward Sequential Memory Network)是一种识别精度较高的声学模型,其在相邻的记忆块间引入跳跃链接来缓解梯度消失问题。而训练一个深层堆叠的DFSMN仍是十分具有挑战性的任务,且简单的网络层堆叠并不能使网络模型的性能得到提升。在构造非常深的神经网络结构时,残差学习是一种有效的方法,可以帮助神经网络更容易、更快地收敛。提出一种名为门控残差DFSMN(Gated Residual DFSMN,GR-DFSMN)的新型网络结构。该模型从低层DFSMN块引入了额外的门控捷径用于有效地训练深层DFSMN结构的网络。实验结果表明,当训练非常深的模型时,GR-DFSMN相比于普通的DFSMN具有较好的性能。在1000小时的大规模英语语料库Librispeech中,当层数达到40时,与DFSMN相比,GR-DFSMN在四个测试集上评估所得的平均字错误率降低了0.7%。Deep Feedforward Sequential Memory Network(DFSMN)is a powerful acoustic model in terms of recognition accuracy.It alleviates the gradient vanishing problem by introducing skip connections between memory blocks in adjacent layers.However,we find it is still a challenging task to optimize the neural networks when training very deep DFSMNs and simply stacking more layers can not lead to better neural networks.Residual learning is an efficient method to help neural networks converge easier and faster when building very deep structures.A novel network architecture named gated residual DFSMN(GR-DFSMN)is proposed.It introduces additional gate controlled shortcut paths from lower DFSMN blocks for efficient training of networks with very deep DFSMN structures.Experimental results have shown that GR-DFSMN can outperform the original DFSMN when training very deep models.In the 1000 hours English Librispeech task,when the number of layers reaches 40,compared with DFSMN,the average word error rate of GR-DFSMN on the four test sets is reduced by 0.7%.

关 键 词:语音识别 DFSMN 门控残差 CTC 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象