基于傅里叶卷积的多通道语音增强  被引量:2

Multi-channel Speech Enhancement Based on Fourier Convolution

在线阅读下载全文

作  者:孙思雨 张海剑[1] 陈佳佳 SUN Siyu;ZHANG Haijian;CHEN Jiajia(School of Electronic Information,Wuhan University,Wuhan 430072,China)

机构地区:[1]武汉大学电子信息学院,湖北武汉430072

出  处:《无线电工程》2024年第3期580-588,共9页Radio Engineering

基  金:湖北省自然科学基金(2022CFB084)。

摘  要:神经波束形成器(Neural Beamformer)的构建是处理多通道语音增强任务的主要方法之一,其通过求解波束权值对多通道信号进行滤波从而获得纯净语音。与传统波束求解空间协方差矩阵的原理类似,频谱信息和空间线索在神经波束形成器的波束权值估计中也起着至关重要的作用。由于缺乏对频谱和空间信息的充分学习,现有许多工作无法对波束权值进行最优估计。为应对这一挑战,构建了一种基于傅里叶卷积的上下文特征提取器,在频率轴上具有全局感受野,并加入时频卷积模块对时间上下文信息建模,增强对输入频谱图上下文信息的学习;采用了一种新的卷积循环网络(Convolutional Recurrent Network, CRN)结构,其编解码模块中嵌入了所提的上下文特征提取器,并在跳连接中嵌入卷积注意力模块(Convolutional Block Attention Module, CBAM)。所提出的CRN结构能充分从输入特征频谱图中捕获时频上下文信息以及跨通道的空间信息。实验结果表明,该方法参数量仅1.14 M,与目前先进的基线系统对比达到最优性能。The construction of neural beamformer is one of the main methods to deal with multi-channel speech enhancement tasks,which filters the multi-channel signals to obtain target speech by solving the beam weights.Similar to the principle of the solution of spatial covariance matrix in traditional beamforming,spectral-spatial information also plays a crucial role in the beam weights prediction of neural beamformer.However,due to the lack of adequate learning of spectral-spatial information,many existing efforts fail to optimally predict the beam weights.In order to deal with this challenge,a context feature extractor based on Fourier convolution is proposed,with which a global receptive field on the frequency is involved.Besides,the modeling of temporal context information is also realized by adding a time-frequency convolutional module to boost the learning of context from input spectrograms.In addition,a Convolutional Recurrent Network(CRN)structure is applied,in which the proposed context feature extractor is embedded in the encoders and decoders,and a Convolutional Block Attention Module(CBAM)is involved in the skip connection.The proposed CRN structure can capture the time-frequency context information and cross-channel spatial features sufficiently from the input spectrograms.Experimental results show that the parameter quantity of the proposed approach is only 1.14 M,which indicates great superiority over the existing advanced baseline systems.

关 键 词:多通道 语音增强 神经波束形成器 傅里叶卷积 深度学习 

分 类 号:TN911.7[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象