检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:余传旗 王婷婷[1] 郭海燕 杨震[1,2] YU Chuanqi;WANG Tingting;GUO Haiyan;YANG Zhen(School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;National Local Joint Engineering Research Center for Communications and Network Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
机构地区:[1]南京邮电大学通信与信息工程学院,江苏南京210003 [2]南京邮电大学通信与网络技术国家地方联合工程研究中心,江苏南京210003
出 处:《南京邮电大学学报(自然科学版)》2024年第6期44-52,共9页Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基 金:国家自然科学基金(62071242)资助项目。
摘 要:目前,基于深度学习的时域单通道语音分离模型在无噪声场景下取得了显著的成效。然而,在含噪场景下,这些模型的编码器会将噪声特征误认为是源语音特征,影响掩码估计的准确性,导致分离性能不理想。针对此问题,提出一种基于注意力机制的时域语音分离模型,来降低噪声对语音分离任务的影响。具体地,考虑到时域编码器输出特征的各通道重要性差异,提出在编码器内部嵌入一个高效通道注意力(Efficient Channel Attention,ECA)模块,对编码特征的通道进行加权处理。在此基础上,提出采用图注意力网络(Graph Attention Network,GAT)来计算相邻帧间的注意力系数,以此聚合相邻帧间的编码特征,从而隐式地减小了噪声对掩码估计的影响。系统模型在WHAM!、Libri2Mix-Noisy和Libri3Mix-Noisy数据集上的实验结果表明,所提出的基于GAT和ECA的DPRNN(GACA-DPRNN)方法比基线DPRNN性能更优。Deep learning-based time-domain single-channel speech separation models have achieved significant success in noise-free scenarios.However,they tend to mistakenly encode noise features as source speech features in noisy environments,which affects the accuracy of mask estimation and results in suboptimal separation performance.To deal with this problem,we propose a time-domain speech separation model based on attention mechanisms to mitigate the negative impact of noise on separation performance.First,given the disparate importance of channels in the output features from the temporal encoder,we introduce an efficient channel attention(ECA)module embedded within the encoder to perform weighted processing on the channel-wise features.Second,we adopt a graph attention network(GAT)to compute attention coefficients between adjacent frames for the aggregation of encoded features from neighboring frames,thus the influence of noise on mask estimation can be reduced.Experimental results on the WHAM!,Libri2Mix-Noisy,and Libri3Mix-Noisy datasets demonstrate that the proposed GAT-ECA-based DPRNN(GACA-DPRNN)outperforms the DPRNN baseline in terms of scale invariant signal-to-noise ratio improvement(SI-SNRi)and signal distortion ratio improvement(SDRi).
分 类 号:TN912.35[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222