检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张晓艳 张天骐[1] 葛宛营 白杨柳 ZHANG Xiaoyan;ZHANG Tianqi;GE Wanying;BAI Yangliu(School of Communication and Information Engineering/Chongqing Key Laboratory of Signal and Information Processing,Chongqing University of Posts and Telecommunications,Chongqing 400065)
机构地区:[1]重庆邮电大学通信与信息工程学院/信号与信息处理重庆市重点实验室,重庆400065
出 处:《声学学报》2021年第3期471-480,共10页Acta Acustica
基 金:国家自然科学基金项目(61671095,61702065,61701067,61771085);信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003);重庆市研究生科研创新项目(CYS19248);重庆市教育委员会科研项目(KJ1600427,KJ1600429)的资助。
摘 要:噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。The accuracy of noise estimation directly affects the quality of speech enhancement algorithm.To improve the noise suppression effect of current speech enhancement algorithm when noise is estimated and effectively solve the unconstrained optimization problem,a time-frequency mask algorithm based on DNN(Deep Netual Networks)combined with convex optimization is proposed for monaural speech enhancement.Firstly,the power spectra of noisy speech is extracted as the input of DNN;Secondly,the inter-channel correlation factor between noise and speech is taken as the training target of DNN;Then,the objective function of convex optimization is constructed by using the correlation factor obtained from DNN model;Finally,new hybrid conjugate gradient method based on DNN combined with convex optimization,is used to perform iterative processing for initial mask.The final updated mask is used to obtain the enhanced speech.Simulation experimental results show that under different background noise with low SNR,compared with conventional methods,the obtained ratio mask makes the enhanced speech obtain better LSD(Log Spectral Distance),PESQ(Perceptual Evaluation of Speech Quality),STOI(Short-Time Objective Intelligibility)and segSNR(segmental Signal to Noise Ratio)indices,and improves the overall quality of speech and can effectively suppress noise.
关 键 词:带噪语音 梯度下降法 DNN 抑制噪声 互相关系数 低信噪比 增强算法 凸优化 深度神经网络
分 类 号:TN912.35[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.100.179