检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙思雨 张海剑[1] 陈佳佳 SUN Siyu;ZHANG Haijian;CHEN Jiajia(School of Electronic Information,Wuhan University,Wuhan 430072,China)
出 处:《无线电工程》2024年第3期580-588,共9页Radio Engineering
基 金:湖北省自然科学基金(2022CFB084)。
摘 要:神经波束形成器(Neural Beamformer)的构建是处理多通道语音增强任务的主要方法之一,其通过求解波束权值对多通道信号进行滤波从而获得纯净语音。与传统波束求解空间协方差矩阵的原理类似,频谱信息和空间线索在神经波束形成器的波束权值估计中也起着至关重要的作用。由于缺乏对频谱和空间信息的充分学习,现有许多工作无法对波束权值进行最优估计。为应对这一挑战,构建了一种基于傅里叶卷积的上下文特征提取器,在频率轴上具有全局感受野,并加入时频卷积模块对时间上下文信息建模,增强对输入频谱图上下文信息的学习;采用了一种新的卷积循环网络(Convolutional Recurrent Network, CRN)结构,其编解码模块中嵌入了所提的上下文特征提取器,并在跳连接中嵌入卷积注意力模块(Convolutional Block Attention Module, CBAM)。所提出的CRN结构能充分从输入特征频谱图中捕获时频上下文信息以及跨通道的空间信息。实验结果表明,该方法参数量仅1.14 M,与目前先进的基线系统对比达到最优性能。The construction of neural beamformer is one of the main methods to deal with multi-channel speech enhancement tasks,which filters the multi-channel signals to obtain target speech by solving the beam weights.Similar to the principle of the solution of spatial covariance matrix in traditional beamforming,spectral-spatial information also plays a crucial role in the beam weights prediction of neural beamformer.However,due to the lack of adequate learning of spectral-spatial information,many existing efforts fail to optimally predict the beam weights.In order to deal with this challenge,a context feature extractor based on Fourier convolution is proposed,with which a global receptive field on the frequency is involved.Besides,the modeling of temporal context information is also realized by adding a time-frequency convolutional module to boost the learning of context from input spectrograms.In addition,a Convolutional Recurrent Network(CRN)structure is applied,in which the proposed context feature extractor is embedded in the encoders and decoders,and a Convolutional Block Attention Module(CBAM)is involved in the skip connection.The proposed CRN structure can capture the time-frequency context information and cross-channel spatial features sufficiently from the input spectrograms.Experimental results show that the parameter quantity of the proposed approach is only 1.14 M,which indicates great superiority over the existing advanced baseline systems.
关 键 词:多通道 语音增强 神经波束形成器 傅里叶卷积 深度学习
分 类 号:TN911.7[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.100.196