检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄辉波 邵玉斌[1] 龙华[1] 杜庆治[1] HUANG Huibo;SHAO Yubin;LONG Hua;DU Qingzhi(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,P.R.China)
机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500
出 处:《重庆邮电大学学报(自然科学版)》2024年第4期786-796,共11页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基 金:云南省媒体融合重点实验室项目(220235205)。
摘 要:针对低信噪比下音素识别准确率低的问题,提出一种新的识别方法。提取语音的Fbank特征,输入到由多头注意力机制、ResNet、BLSTM、CTC构建的A-R-B-CTC模型中进行音素识别,利用Wave-U-Net对语音特征Fbank、MFCC、GFCC、对数频谱进行图像去噪,发现Fbank特征去噪后,可以取得更低的音素错误率。在0 dB白噪声环境下采用THCHS30数据集进行实验验证。结果表明,Fbank去噪前,所提A-R-B-CTC模型相比于BLSTM-CTC、ResNet-BLSTM-CTC、Transformer模型,平均音素错误率分别降低了4.38%、2.5%、1.96%;Fbank去噪后,4种模型的音素错误率明显下降,其中所提A-R-B-CTC模型相比于其他3种模型性能依旧出色。此外,在其他信噪比下也达到了不错的效果。Aiming at the problem of low accuracy of phoneme recognition under low signal-to-noise ratio,a new recognition method is proposed.Firstly,the Fbank features of speech are extracted and input into the A-R-B-CTC model constructed by multi-head attention mechanism,ResNet,BLSTM,and CTC for phoneme recognition.Then,the image denoising of the speech features Fbank,MFCC,GFCC,and logarithmic spectrum is performed by utilizing Wave-U-Net,and it is found that the denoising of the Fbank features results in a more lower phoneme error rate.The THCHS30 dataset is used for experimental validation in a 0 dB white noise environment.The results show that before Fbank denoising,the proposed A-R-B-CTC model reduces the average phoneme error rate by 4.38%,2.5%,and 1.96%compared to the BLSTM-CTC,ResNet-BLSTM-CTC,and Transformer models,respectively;after Fbank denoising,the phoneme error rates of the four models are significantly reduced,and the proposed A-R-B-CTC model still performs well compared to the other three models.In addition,good results are also achieved at other signal-to-noise ratios.
关 键 词:音素识别 Wave-U-Net 端到端 多头自注意力机制 Transformer模型
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49