基于时频注意力机制与U-Net的骨导语音鲁棒增强方法被引量：2

Bone-Conducted Robust Speech Enhancement Based on Time-Frequency Domain Attention Mechanism and U-Net

作　　者：张玥张雄伟孙蒙 ZHANG Yue;ZHANG Xiongwei;SUN Meng(College of Command and Control Engineering,Army Engineering University of PLA,Nanjing,Jiangsu 210007,China)

机构地区：[1]中国人民解放军陆军工程大学指挥控制工程学院,江苏南京210007

出　　处：《信号处理》2022年第10期2134-2143,共10页Journal of Signal Processing

基　　金：国家自然科学基金(62071484)。

摘　　要：近年来,基于神经网络的方法大量应用于骨导语音增强中。然而,由于骨导数据集样本较少,骨导语音高频部分缺失,不同说话人高频部分失真程度不同,神经网络难以有效学习骨导语音的频谱特征。因此,现有骨导语音增强模型对于未知说话人骨导语音数据集增强效果不佳、鲁棒性不强。为充分利用骨导语音的时频信息,引导模型关注骨导语音的低频部分特征,提出一种基于时频注意力机制和U-Net的骨导语音增强方法。该方法将时频注意力机制引入U-Net结构中,首先根据骨导语音时间、频率方向特征信息的重要程度自动为其分配权重,而后以加权后的骨导语音谱作为输入,对应的气导语音谱作为目标进入U-Net结构训练,最后利用训练完成的增强模型重构骨导语音全频带的语音。仿真实验与可视化分析结果表明,对比基线U-Net结构与其他注意力机制,该方法对于未知说话人骨导语音数据集能够取得更高的PESQ和STOI客观评价指标,增强语音更加清晰。In recent years,methods based on neural networks are applied to Bone-Conducted(BC)speech enhancement.However,due to the small number of BC speech datasets,the lack of BC speech in high-frequency part,and the different distortion degree of different speakers in high-frequency part,it is difficult for neural networks to effectively learn the spec⁃trum characteristics.As a result,the existing BC speech enhancement methods are not effective and robust enough to un⁃seen speakers.In order to make full use of the time-frequency information of BC speech and guide the model to pay atten⁃tion to the characteristics of low-frequency spectrum,this paper proposes a robust enhancement method based on the time-frequency domain attention mechanism and U-Net.This method introduces the time-frequency attention mechanism into the U-Net structure.Weight is first automatically distributed according to the important information of the characteristic in⁃formation in time and frequency direction.Then use the weighted BC spectrum as the input,and the corresponding Air-Conducted(AC)speech spectrum as the goal to enter the U-Net structure training,and finally uses the speech enhance⁃ment model to reconstruct full-band speech.The simulation and visual analysis results show that the method proposed in this paper can achieve higher objective evaluation scores of PESQ and STOI and better speech intelligibility than the base⁃line U-Net structure and other attention mechanisms on the unseen speaker datasets.

关键词：骨导语音增强时频注意力机制 U-Net

分类号：TN912.3[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于时频注意力机制与U-Net的骨导语音鲁棒增强方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于时频注意力机制与U-Net的骨导语音鲁棒增强方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于时频注意力机制与U-Net的骨导语音鲁棒增强方法被引量：2