加性频域分解的生成对抗网络语音去混响

Speech Dereverberation Based on Generative Adversarial Network with Additive Frequency Domain Decomposition

作　　者：全海燕[1] 王涛[1] 郑志清 QUAN Haiyan;WANG Tao;ZHENG Zhiqing(School of Info.Eng.and Automation,Kunming Univ.of Technol.,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,云南昆明650500

出　　处：《工程科学与技术》2022年第2期180-187,共8页Advanced Engineering Sciences

基　　金：国家自然科学基金项目(41364002;61861023)。

摘　　要：混响语音信号包括由路径延迟效应引起的不同频率分量,这些频率分量在频域中进行相关调制。为了降低混响语音在频谱中的高相关性,提出了一种基于加性频域分解的改进生成对抗网络(generative adversarial network,GAN)算法。首先,对混响语音的短时幅度谱进行对数运算,将调制的混响语音幅度谱转换为线性幅度谱,从而对卷积的语音分量进行分解;然后,通过sigmoid非线性函数进行归一化以平衡数据分布,再将解调后的幅度谱应用于深度全卷积网络以训练GAN模型;最后,基于生成模型和判别模型的对抗性学习机制,可以有效学习混响语音和声源语音的分布多样性,指导生成模型更精确地重构增强语音。采用Aishell中文语音数据集进行算法性能验证,分别比较了GAN、FCN和DNN模型有(或无)加性频域分解的去混响性能,并通过语谱图的差异来证明所提方法的有效性。实验结果表明,在4种不同的混响时间参数下,采用加性频域分解的GAN、FCN和DNN模型的PESQ、STOI、LSD评价分数比没有加性频域分解的提高了10%左右。因此,加性频域分解在用于语音去混响时可以有效提高GAN的性能。同时,在非同源测试集下也具有较好的泛化能力。The reverberant speech signal includes different frequency components induced by the effect of path delay. The frequency components are correlatedly modulated in frequency domain. In order to reduce the high correlation of reverberant speech in the spectrum, an improved generative adversarial network(GAN) algorithm based on additive frequency domain decomposition was proposed. Firstly, the short-time amplitude spectrums of the reverberant speech were processed with the logarithmic operation, by which the modulated amplitude spectrums of reverberant speech were converted into the linear ones, and then the convolved speech components were decomposed. After normalized by the sigmoid nonlinear function to balance the data distribution, the demodulated amplitude spectrums were applied to a deep fully convolutional network to train a GAN model. Finally, based on the adversarial learning mechanism of the generative model and the discriminative model, the distribution diversity of the reverberant speech and the source speech were effectively learned, and the enhanced speech signal was accurately reconstructed with the generative model. In experiments, the Chinese speech data set of Aishell was used to test the performance of the proposed algorithm. The dereverberation performances of GAN, FCN, and DNN with(or without) additive frequency domain decomposition were respectively compared and demonstrated by the difference of spectrograms. Experimental results showed that under four different reverberation time parameters, the PESQ, STOI, and LSD’s evaluation scores of GAN, FCN, and DNN with additive frequency domain decomposition are about 10% higher than the ones without additive frequency domain decomposition. In conclusion, the additive frequency domain decomposition can effectively improve the performance of GAN in speech dereverberation application. Generally, the algorithm can be also applied to the non-homologous speech dereverberation.

关键词：语音去混响对数运算加性频域分解生成对抗网络

分类号：TP912[自动化与计算机技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

加性频域分解的生成对抗网络语音去混响

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

加性频域分解的生成对抗网络语音去混响

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索