检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张玥 张雄伟 孙蒙 ZHANG Yue;ZHANG Xiongwei;SUN Meng(College of Command and Control Engineering,Army Engineering University of PLA,Nanjing,Jiangsu 210007,China)
机构地区:[1]中国人民解放军陆军工程大学指挥控制工程学院,江苏南京210007
出 处:《信号处理》2022年第10期2134-2143,共10页Journal of Signal Processing
基 金:国家自然科学基金(62071484)。
摘 要:近年来,基于神经网络的方法大量应用于骨导语音增强中。然而,由于骨导数据集样本较少,骨导语音高频部分缺失,不同说话人高频部分失真程度不同,神经网络难以有效学习骨导语音的频谱特征。因此,现有骨导语音增强模型对于未知说话人骨导语音数据集增强效果不佳、鲁棒性不强。为充分利用骨导语音的时频信息,引导模型关注骨导语音的低频部分特征,提出一种基于时频注意力机制和U-Net的骨导语音增强方法。该方法将时频注意力机制引入U-Net结构中,首先根据骨导语音时间、频率方向特征信息的重要程度自动为其分配权重,而后以加权后的骨导语音谱作为输入,对应的气导语音谱作为目标进入U-Net结构训练,最后利用训练完成的增强模型重构骨导语音全频带的语音。仿真实验与可视化分析结果表明,对比基线U-Net结构与其他注意力机制,该方法对于未知说话人骨导语音数据集能够取得更高的PESQ和STOI客观评价指标,增强语音更加清晰。In recent years,methods based on neural networks are applied to Bone-Conducted(BC)speech enhancement.However,due to the small number of BC speech datasets,the lack of BC speech in high-frequency part,and the different distortion degree of different speakers in high-frequency part,it is difficult for neural networks to effectively learn the spec⁃trum characteristics.As a result,the existing BC speech enhancement methods are not effective and robust enough to un⁃seen speakers.In order to make full use of the time-frequency information of BC speech and guide the model to pay atten⁃tion to the characteristics of low-frequency spectrum,this paper proposes a robust enhancement method based on the time-frequency domain attention mechanism and U-Net.This method introduces the time-frequency attention mechanism into the U-Net structure.Weight is first automatically distributed according to the important information of the characteristic in⁃formation in time and frequency direction.Then use the weighted BC spectrum as the input,and the corresponding Air-Conducted(AC)speech spectrum as the goal to enter the U-Net structure training,and finally uses the speech enhance⁃ment model to reconstruct full-band speech.The simulation and visual analysis results show that the method proposed in this paper can achieve higher objective evaluation scores of PESQ and STOI and better speech intelligibility than the base⁃line U-Net structure and other attention mechanisms on the unseen speaker datasets.
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145