检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高戈[1] 王霄 曾邦 尹文兵 陈怡[2] GAO Ge;WANG Xiao;ZENG Bang;YIN Wenbing;CHEN Yi(National Engineering Research Center for Multimedia Software(Wuhan University),Wuhan Hubei 430072,China;School of Computer Science,Central China Normal University,Wuhan Hubei 430077,China)
机构地区:[1]国家多媒体软件工程技术研究中心(武汉大学),武汉430072 [2]华中师范大学计算机学院,武汉430079
出 处:《计算机应用》2022年第S01期316-320,共5页journal of Computer Applications
摘 要:在频域语音增强算法中,由于估计幅度谱与带噪相位谱的不匹配,其性能难以突破固有上限。在时域语音增强框架中,模型将时域波形作为输入,由网络直接学习时域波形之间的映射关系,有效地避免了无效短时傅里叶变换(STFT)问题。然而,常见的采用波形最小均方误差的时域语音增强算法对语音频域特征的建模并未达到最优。针对这个问题,提出一种基于时频联合损失函数的语音增强算法。首先将时频联合损失函数应用到Wave-U-Net时域语音增强网络,同时设计并分析了一阶范数形式和二阶范数形式的时频联合损失函数对增强网络的影响,最后得到了面向语音通信任务和语音识别任务的相对最佳损失函数选择方案。实验结果表明,相较于采用时域损失的增强网络,采用面向语音通信时最佳联合损失函数的增强网络在语音质量的感知评估(PESQ)和短时目标清晰度(STOI)分别实现了3.6%和2.30%的相对提升,采用面向语音识别时最佳联合损失函数的增强网络在字符错误率(CER)上实现了1.82%的相对降低。相较于Wave-U-Net时域语音增强网络,该算法有更好的噪声抑制效果,在后端语音识别任务中表现更为优秀。In the frequency-domain speech enhancement algorithms,the performance is difficult to break the inherent upper limit due to the mismatch between the estimated amplitude spectrum and the band-noise phase spectrum.In the time-domain speech enhancement framework,time-domain waveform is taken as the input of the model and the mapping relationship between time-domain waveforms is learned directly by the network,which effectively avoids the invalid Short-Time Fourier Transform(STFT)problem.However,the common time-domain speech enhancement algorithm using waveform minimum mean square error does not achieve the optimal modeling of speech frequency-domain features.To address this problem,a speech enhancement algorithm based on a time-frequency joint loss function was proposed.Firstly,the time-frequency joint loss function was applied to the Wave-U-Net time-domain speech enhancement network,and the effects of the first-order and second-order parametric forms of the time-frequency joint loss function on the enhancement network were designed and analyzed,and finally the relative best loss function selection scheme for speech communication tasks and speech recognition tasks was obtained.The experimental results show that,compared with the enhancement network using time-domain loss,the enhancement network using the best joint loss function achieves a relative improvement of 3.6%and 2.30%in Perceptual Evaluation of Speech Quality(PESQ)and Short-Time Objective Intelligibility(STOI)respectively for speech communication tasks,and a relative improvement of 1.82%in Character Error Rate(CER)for speech recognition tasks.Compared with the Wave-U-Net time-domain speech enhancement network,the proposed algorithm has better noise suppression and performs better in back-end speech recognition tasks.
关 键 词:时域语音增强 联合损失函数 语音通信 语音识别 深度学习
分 类 号:TN912.35[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.8