基于时频信息梯度估计的单通道语音增强方法  

Single-channel speech enhancement method based on time-frequency information gradient estimation

在线阅读下载全文

作  者:高盛祥[1,2] 方妍文 余正涛[1,2] 董凌 莫尚斌 GAO Shengxiang;FANG Yanwen;YU Zhengtao;DONG Ling;MO Shangbin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Media Convergence,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500 [3]云南省融媒体重点实验室,云南昆明650500

出  处:《厦门大学学报(自然科学版)》2024年第6期1051-1058,共8页Journal of Xiamen University:Natural Science

基  金:国家自然科学基金(62376111,U21B2027,61972186,62466030);云南高新技术产业发展项目(201606);云南省重大科技专项计划(202303AP140008,202103AA080015,202302AD080003);云南省基础研究计划(202001AS070014);云南省学术和技术带头人后备人才(202105AC160018)。

摘  要:[目的]语音增强可用于提升现实噪声环境下语音翻译系统的性能.针对现有基于概率扩散模型的语音增强方法存在生成语音结构被破坏、难以对全局特征建模的问题进行研究.[方法]本文提出基于时频信息梯度估计的单通道语音增强方法.首先将语音复数谱送入编码器中提取深层表征,并提出将残差快速傅里叶卷积(residual fast fourier convolution,Res-FFC)用于修复生成语音并对语音全局特征进行建模,同时在编解码的过程中融入语音时域信息.[结果]在公开数据集Voice Bank-DEMAND上的实验结果表明,相比基于分数生成模型的复数时频域语音增强网络(SGMSE),本文所提方法在客观评价指标SI-SDR和WB-PESQ分别提高0.5和0.19.[结论]本文提出的语音增强方法通过融入Res-FFC和语音时域信息,提升了模型对语音全局特征的捕捉能力,可有效抑制噪声,提升语音质量.[Objective]Speech enhancement can be used to improve the performance of speech translation systems in real-world noisy environments.Herein our research is conducted to address issues of existing speech enhancement methods based on probabilistic diffusion models,such as the disruption of generated speech structure and the difficulty in modeling global features.[Methods]In this paper,we propose a single-channel speech enhancement method based on time-frequency information gradient estimation.Initially,the speech complex spectrum is fed into an encoder to extract deep representations.It introduces the usage of residual fast Fourier convolution(Res-FFC)to restore generated speech and model global speech features,while incorporating speech temporal information in the encoding and decoding process.[Results]Experimental results on the public dataset Voice Bank-DEMAND demonstrate that,compared to the complex time-frequency domain speech enhancement network based on fraction generating models(SGMSE),the proposed method improves the objective evaluation metrics SI-SDR and WB-PESQ by 0.5 and 0.19,respectively.[Conclusions]The proposed speech-enhancement method enhances the ability of the model to capture global speech features by incorporating Res-FFC and temporal information of the speech,effectively suppressing noises and improving the speech quality.

关 键 词:语音增强 概率扩散模型 单通道 快速傅里叶卷积 

分 类 号:TN912.35[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象