低信噪比下基于融合网络的音素识别方法  

Phoneme recognition method based on fusion network with lowsignal-to-noise ratio

在线阅读下载全文

作  者:黄辉波 邵玉斌[1] 龙华[1] 杜庆治[1] HUANG Huibo;SHAO Yubin;LONG Hua;DU Qingzhi(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,P.R.China)

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500

出  处:《重庆邮电大学学报(自然科学版)》2024年第4期786-796,共11页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基  金:云南省媒体融合重点实验室项目(220235205)。

摘  要:针对低信噪比下音素识别准确率低的问题,提出一种新的识别方法。提取语音的Fbank特征,输入到由多头注意力机制、ResNet、BLSTM、CTC构建的A-R-B-CTC模型中进行音素识别,利用Wave-U-Net对语音特征Fbank、MFCC、GFCC、对数频谱进行图像去噪,发现Fbank特征去噪后,可以取得更低的音素错误率。在0 dB白噪声环境下采用THCHS30数据集进行实验验证。结果表明,Fbank去噪前,所提A-R-B-CTC模型相比于BLSTM-CTC、ResNet-BLSTM-CTC、Transformer模型,平均音素错误率分别降低了4.38%、2.5%、1.96%;Fbank去噪后,4种模型的音素错误率明显下降,其中所提A-R-B-CTC模型相比于其他3种模型性能依旧出色。此外,在其他信噪比下也达到了不错的效果。Aiming at the problem of low accuracy of phoneme recognition under low signal-to-noise ratio,a new recognition method is proposed.Firstly,the Fbank features of speech are extracted and input into the A-R-B-CTC model constructed by multi-head attention mechanism,ResNet,BLSTM,and CTC for phoneme recognition.Then,the image denoising of the speech features Fbank,MFCC,GFCC,and logarithmic spectrum is performed by utilizing Wave-U-Net,and it is found that the denoising of the Fbank features results in a more lower phoneme error rate.The THCHS30 dataset is used for experimental validation in a 0 dB white noise environment.The results show that before Fbank denoising,the proposed A-R-B-CTC model reduces the average phoneme error rate by 4.38%,2.5%,and 1.96%compared to the BLSTM-CTC,ResNet-BLSTM-CTC,and Transformer models,respectively;after Fbank denoising,the phoneme error rates of the four models are significantly reduced,and the proposed A-R-B-CTC model still performs well compared to the other three models.In addition,good results are also achieved at other signal-to-noise ratios.

关 键 词:音素识别 Wave-U-Net 端到端 多头自注意力机制 Transformer模型 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象