检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯成立 程雯[1] FENG Chengli;CHENG Wen(Wuhan Research Institute of Posts&Telecommunications,Wuhan 430000)
出 处:《计算机与数字工程》2023年第2期440-444,共5页Computer & Digital Engineering
摘 要:传统语音识别声学模型DFCNN在对语音特征进行提取的时,采用深度卷积模型只考虑了局部特征,对不同的声学特征无法抓重点选择,且训练速度较慢,难以收敛。文本针对这些问题,提出一种基于深度残差的卷积神经网络的声学模型DRCNN。结合CTC技术,直接使用DRCNN对声学特征进行建模,使用SE-Block通道加权残差机制和深度堆叠结构,加快声学特征提取过程,增强拟合能力,提高训练速度。在此基础上搭建基于transformer的语言模型。相比传统DFCNN-HMM模型,更能学习到语音信息的深度特征,增强声学模型,语言模型鲁棒性。实验结果表明,在中文语音识别数据集,文本提出的语音识别算法相比DFCNN-HMM有在字错误率WER上有4.03%的提升。When the traditional speech recognition acoustic model DFCNN extracts speech features,the deep convolution model only considers local features,and cannot focus on different acoustic features,and the training speed is slow and difficult to converge.In response to these problems,the text proposes an acoustic model DRCNN based on deep residual convolutional neural network.Combining CTC technology,DRCNN is directly used to model acoustic features,SE-Block channel weighted residual mechanism and deep stacking structure are used to speed up the acoustic feature extraction process,enhance the fitting ability,and increase the training speed.On this basis,a transformer-based language model is built.Compared with the traditional DFCNN-HMM model,it can learn the in-depth features of speech information and enhance the robustness of the acoustic model and language model.The experimental results show that in the Chinese speech recognition data set,the speech recognition algorithm proposed by the text has a 4.03%improvement in the word error rate WER compared to DFCNN-HMM.
关 键 词:语音识别 CNN TRANSFORMER 自注意力机制 残差链接 SE-Block
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.60.86