联合精确比值掩蔽与深度神经网络的单通道语音增强方法  被引量:6

Speech enhancement combining accurate ratio masking and deep neural network

在线阅读下载全文

作  者:柏浩钧 张天骐[1] 刘鉴兴 叶绍鹏 BAI Haojun;ZHANG Tianqi;LIU Jianxing;YE Shaopeng(School of Communication and Information Engineering,Chongqing Key Laboratory of Signal and Information Processing(CQKLS&IP),Chongqing University of Posts and Telecommunications(CQUPT),Chongqing 400065)

机构地区:[1]重庆邮电大学通信与信息工程学院信号与信息处理重庆市重点实验室,重庆400065

出  处:《声学学报》2022年第3期394-404,共11页Acta Acustica

基  金:国家自然科学基金项目(61671095,61702065,61701067,61771085);信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003);重庆市自然基金项目(cstc2021jcyj-msxmX0836);重庆市教育委员会科研项目(KJ1600427,KJ1600429)资助。

摘  要:针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。Aiming at the problem that the impact of the similarity of amplitude spectrum between pure speech,noise,and noisy speech on enhancement effect is neglected in recent supervised speech enhancement,a method combining Accurate Ratio Masking(ARM)and Deep Neural Network(DNN)is proposed for monaural speech enhancement.Firstly,an accurate ratio masking based on ideal ratio masking in the time-frequency domain is designed,which utilizes the normalized cross-correlation coefficient of amplitude spectrum between pure speech and noisy speech,and between noise and noisy speech.Then,the target masking is estimated by the output of the baseline DNN which takes the amplitude spectrum of pure speech and noise as training target,and further uses the target masking to optimize the baseline DNN and get the enhanced speech from noisy speech.Moreover,considering the discriminative information between pure speech and noise,a discriminative training function is used to replace the Mean Square Error(MSE)as the objective function of the baseline DNN,thus making the output of network more accurate.The experimental results show that the discriminative training function improves the enhancement effect of baseline DNN and the overall joint optimization network Under matched and mismatched noise,compared with other common DNN methods,the proposed method gets higher average Perceptual Evaluation of Speech Quality(PESQ)and Short-Time Objective Intelligibility(STOI),and the enhanced speech retains more speech components and has a more obvious suppression effect on noise.

关 键 词:深度神经网络 归一化互相关系数 幅度谱 语音增强 区分性 联合优化 可懂度 掩蔽 

分 类 号:TN912.35[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象