基于注意力机制和掩码学习的GAN语音增强算法  

GAN Speech Enhancement Algorithm based on Attention Mechanism and Mask Learning

在线阅读下载全文

作  者:李彤岩 裴浩延 裴燕 陈旭 王涛 LI Tongyan;PEI Haoyan;PEI Yan;CHEN Xu;WANG Tao(College of Communication Engineering,Chengdu University of Information Technology,Chengdu 610225,China)

机构地区:[1]成都信息工程大学通信工程学院,四川成都610225

出  处:《成都信息工程大学学报》2025年第2期137-142,共6页Journal of Chengdu University of Information Technology

基  金:四川省科技厅资助项目(2023YFS0422)。

摘  要:语音增强是自动语音识别的重要组成之一,近年来,生成对抗网络及其变体模型在语音增强中的建模能力逐渐增强,但仍有泛化能力弱、无法适应低信噪比环境等问题。对此,提出一种结合注意力机制双向长短期记忆网络及掩码学习的GAN语音增强模型。该框架创新了语音增强机制,利用双向长短期记忆网络及注意力层作为生成对抗网络的生成器,并引入掩码学习进行频谱重构,将滤波后的信号与原始信号进行叠加得到增强信号,输入判别器后,两个网络相互博弈达到语音增强的目的。采用TIMIT数据集,通过对比语音质量客观评估和短时客观可懂度等语音评价指标,在不同信噪比环境下对该模型进行评估。实验结果表明,该模型的语音增强效果相比基准生成对抗网络等模型平均提升了11.8%,在噪声干扰大的环境下仍有较强的声学建模能力。Speech enhancement is one of the important components in automatic speech recognition(ASR).In recent years,the modeling capability of generative adversarial networks(GANs)and their variants in speech enhancement has been gradually improved.However,they still suffer from weak generalization ability and inability to adapt to low signalto-noise ratio environments.In response to this issue,a GAN-based speech enhancement model called Mask-LAGAN,which combines attention-based bidirectional LSTM(BLSTM)and mask learning,is proposed.The framework innovatively designs the speech enhancement mechanism by using BLSTM and attention layers as the generator of the GAN,and introduces mask learning for spectrum reconstruction.The enhanced signal is obtained by overlaying the filtered signal with the original signal,followed by input to the discriminator.The two networks engage in a mutual adversarial training to achieve the goal of speech enhancement.The TIMIT dataset is utilized for comparative evaluation under different signal-to-noise ratio conditions,using speech evaluation metrics such as PESQ,STOI,and CSIG.Experimental results demonstrate that the proposed model achieves an average improvement of 11.8%in speech enhancement compared to models like SeGAN.

关 键 词:语音增强 生成式对抗网络 注意力-BLSTM 掩码重构 低信噪比 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象