基于时频域生成对抗网络的语音增强算法  被引量:5

Speech Enhancement Based on Time-Frequency Domain GAN

在线阅读下载全文

作  者:尹文兵 高戈[1] 曾邦 王霄 陈怡[2] YIN Wen-bing;GAO Ge;ZENG Bang;WANG Xiao;CHEN Yi(National Engineering Research Center for Multimedia Software,Wuhan University,Wuhan 430072,China;School of Computer Science,Central China Normal University,Wuhan 430077,China)

机构地区:[1]武汉大学国家多媒体软件工程技术研究中心,武汉430072 [2]华中师范大学计算机学院,武汉430077

出  处:《计算机科学》2022年第6期187-192,共6页Computer Science

摘  要:传统基于生成对抗网络的语音增强算法(Speech Enhancement Algorithm Based on Generative Adversarial Networks,SEGAN)在时域上对语音进行增强处理,完全忽略了语音样本在频域上的分布情况。在低信噪比条件下,语音信号会淹没在噪声中,带噪语音的时域分布信息很难捕获,因此,SEGAN的增强性能会急剧下降,其增强语音的语音质量和语音可懂度很低。针对该问题,提出了基于时频域生成对抗网络的语音增强算法(Time-Frequency Domain SEGAN,TFSEGAN)。TFSEGAN采用了时频域双判别器的模型结构和时频域L1损失函数,时域判别器的输入为语音样本的时域特征,频域判别器的输入为语音样本的频域特征。在训练过程中,时域判别器将语音样本的时域分布信息作为判别标准,而频域判别器将语音样本的频域分布信息作为判别标准。在两个判别器的作用下,TFSEGAN的生成器能够同时学习语音样本在时域和频域中的分布规律和信息。实验证明,在低信噪比条件下,与SEGAN相比,TFSEGAN的语音质量与可懂度分别提升了约17.45%和11.75%。The traditional speech enhancement algorithm based on generative adversarial networks(SEGAN)enhances speech in the time domain,and completely ignores the distribution of speech samples in frequency domain.Under the condition of low signal-to-noise ratio,the speech signal will be submerged in noise,and the time-domain distribution information of noisy speech is difficult to capture.Therefore,the enhancement performance of SEGAN will drop sharply,and the speech quality and speech intelligibility of its enhanced speech are very low.To solve this problem,this paper proposes a speech enhancement algorithm(time-frequency domain SEGAN,TFSEGAN)based on time-frequency domain generation confrontation network.TFSEGAN adopts the model structure of the time-frequency domain dual discriminator,and a time-frequency L1 loss function.The input of time domain discriminator is time domain feature of the speech sample,and the input of frequency domain discriminator is frequency domain feature of the speech sample.In the training process,time-domain discriminator uses the time-domain distribution information of speech sample as the criterion,and frequency-domain discriminator uses the frequency-domain distribution information of the speech sample as the criterion.Under the action of two discriminators,the generator of TFSEGAN could simulta-neously learn the distribution rules and information of speech samples in time domain and frequency domain.Experiments prove that,compared with SEGAN,the speech quality and intelligibility of TFSEGAN improve by about 17.45%and 11.75%respectively at low signal-to-noise ratio.

关 键 词:语音增强 生成对抗网络 时频域 低信噪比 语音质量 语音可懂度 

分 类 号:TN912.35[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象