多阶段生成器与时频鉴别器的GAN语音增强算法  被引量:2

GAN Speech Enhancement Algorithm with Multi-stage Generator and Time-frequency Discriminator

在线阅读下载全文

作  者:陈宇[1] 尹文兵 高戈[2] 王霄 曾邦 陈怡[3] CHEN Yu;YIN Wen-Bing;GAO Ge;WANG Xiao;ZENG Bang;CHEN Yi(Frist Research Institute of the Ministry of Public Security of PRC,Beijing 100048,China;National Engineering Research Center for Multimedia Software,Wuhan University,Wuhan 430072,China;School of Computer Science,Central China Normal University,Wuhan 430077,China)

机构地区:[1]公安部第一研究所,北京100048 [2]武汉大学国家多媒体软件工程技术研究中心,武汉430072 [3]华中师范大学计算机学院,武汉430077

出  处:《计算机系统应用》2022年第7期179-185,共7页Computer Systems & Applications

摘  要:传统生成对抗网络的语音增强算法(SEGAN)将时域语音波形作为映射目标,在低信噪比条件下,语音时域波形会淹没在噪声中,导致SEGAN的增强性能会急剧下降,语音失真现象较为严重.针对该问题,提出了一种多阶段的时频域生成对抗网络的语音增强算法(multi-stage-time-frequency SEGAN,MS-TFSEGAN).MS-TFSEGAN采用了多阶段生成器与时频域双鉴别器的模型结构,不断对映射结果进行完善,同时捕获时域与频域信息.另外,为了进一步提升模型对频域细节信息的学习能力,MS-TFSEGAN在生成器损失函数中引入了频域L1损失.实验证明,在低信噪比条件下,MS-TFSEGAN的语音质量和可懂度与SEGAN相比分别提升了约13.32%和8.97%,作为语音识别前端时在CER上实现了7.3%的相对提升.The traditional speech enhancement generative adversarial network(SEGAN)takes the waveform of timedomain speech as the mapping target.When it comes to a low signal-to-noise ratio,the waveform of time-domain speech is drowned in the noise,resulting in a dramatic degradation of the enhancement performance of SEGAN and more serious speech distortion.In response,a multi-stage-time-frequency SEGAN(MS-TFSEGAN)is proposed for speech enhancement.MS-TFSEGAN employs multi-stage generators with dual time-frequency discriminators to continuously refine the mapping results.It captures both time-and frequency-domain information at the same time.In addition,for the further enhancement of learning ability in the frequency domain,MS-TFSEGAN introduces L1 loss in the generator loss function.Experimental results show that the speech quality and intelligibility of MS-TFSEGAN are improved by about13.32%and 8.97%,respectively,compared with SEGAN under low SNR.A relative improvement of 7.3%in CER is achieved when MS-TFSEGAN is used as the front-end of speech recognition.

关 键 词:语音增强 生成对抗网络 低信噪比 语音质量 语音可懂度 语音识别 多阶段模型 深度学习 

分 类 号:TN912.35[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象