检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:全海燕[1] 王涛[1] 郑志清 QUAN Haiyan;WANG Tao;ZHENG Zhiqing(School of Info.Eng.and Automation,Kunming Univ.of Technol.,Kunming 650500,China)
机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500
出 处:《工程科学与技术》2022年第2期180-187,共8页Advanced Engineering Sciences
基 金:国家自然科学基金项目(41364002;61861023)。
摘 要:混响语音信号包括由路径延迟效应引起的不同频率分量,这些频率分量在频域中进行相关调制。为了降低混响语音在频谱中的高相关性,提出了一种基于加性频域分解的改进生成对抗网络(generative adversarial network,GAN)算法。首先,对混响语音的短时幅度谱进行对数运算,将调制的混响语音幅度谱转换为线性幅度谱,从而对卷积的语音分量进行分解;然后,通过sigmoid非线性函数进行归一化以平衡数据分布,再将解调后的幅度谱应用于深度全卷积网络以训练GAN模型;最后,基于生成模型和判别模型的对抗性学习机制,可以有效学习混响语音和声源语音的分布多样性,指导生成模型更精确地重构增强语音。采用Aishell中文语音数据集进行算法性能验证,分别比较了GAN、FCN和DNN模型有(或无)加性频域分解的去混响性能,并通过语谱图的差异来证明所提方法的有效性。实验结果表明,在4种不同的混响时间参数下,采用加性频域分解的GAN、FCN和DNN模型的PESQ、STOI、LSD评价分数比没有加性频域分解的提高了10%左右。因此,加性频域分解在用于语音去混响时可以有效提高GAN的性能。同时,在非同源测试集下也具有较好的泛化能力。The reverberant speech signal includes different frequency components induced by the effect of path delay. The frequency components are correlatedly modulated in frequency domain. In order to reduce the high correlation of reverberant speech in the spectrum, an improved generative adversarial network(GAN) algorithm based on additive frequency domain decomposition was proposed. Firstly, the short-time amplitude spectrums of the reverberant speech were processed with the logarithmic operation, by which the modulated amplitude spectrums of reverberant speech were converted into the linear ones, and then the convolved speech components were decomposed. After normalized by the sigmoid nonlinear function to balance the data distribution, the demodulated amplitude spectrums were applied to a deep fully convolutional network to train a GAN model. Finally, based on the adversarial learning mechanism of the generative model and the discriminative model, the distribution diversity of the reverberant speech and the source speech were effectively learned, and the enhanced speech signal was accurately reconstructed with the generative model. In experiments, the Chinese speech data set of Aishell was used to test the performance of the proposed algorithm. The dereverberation performances of GAN, FCN, and DNN with(or without) additive frequency domain decomposition were respectively compared and demonstrated by the difference of spectrograms. Experimental results showed that under four different reverberation time parameters, the PESQ, STOI, and LSD’s evaluation scores of GAN, FCN, and DNN with additive frequency domain decomposition are about 10% higher than the ones without additive frequency domain decomposition. In conclusion, the additive frequency domain decomposition can effectively improve the performance of GAN in speech dereverberation application. Generally, the algorithm can be also applied to the non-homologous speech dereverberation.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.198