面向语音增强的双复数卷积注意聚合递归网络  被引量:5

Double complex convolution and attention aggregating recurrent network for speech enhancement

在线阅读下载全文

作  者:余本年 詹永照[1] 毛启容[1,2] 董文龙 刘洪麟 YU Bennian;ZHAN Yongzhao;MAO Qirong;DONG Wenlong;LIU Honglin(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang Jiangsu 212013,China;Jiangsu Province Big Data Ubiquitous Perception and Intelligent Agriculture Application Engineering Research Center,Zhenjiang Jiangsu 212013,China)

机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013 [2]江苏省大数据泛在感知与智能农业应用工程研究中心,江苏镇江212013

出  处:《计算机应用》2023年第10期3217-3224,共8页journal of Computer Applications

基  金:江苏省重点研发计划项目(BE2020036)。

摘  要:针对现有的语音增强方法对语谱图特征关联信息表达有限和去噪效果不理想的问题,提出一种双复数卷积注意聚合递归网络(DCCARN)的语音增强方法。首先,建立双复数卷积网络,对短时傅里叶变换后的语谱图特征进行两分支信息编码;其次,将两分支中编码分别使用特征块间和特征块内注意力机制对不同的语音特征信息进行重标注;再次,使用长短期记忆(LSTM)网络处理长时间序列信息,并用两解码器还原语谱图特征并聚合这些特征;最后,经短时逆傅里叶变换生成目标语音波形,以达到抑制噪声的目的。在公开数据集VBD(Voice Bank+DMAND)和加噪的TIMIT数据集上进行的实验的结果表明,与相位感知的深度复数卷积递归网络(DCCRN)相比,DCCARN在客观语音感知质量指标(PESQ)上分别提升了0.150和0.077~0.087。这验证了所提方法能更准确地捕获语谱图特征的关联信息,更有效地抑制噪声,并提高语音的清晰度。Aiming at the problems of limited representation of spectrogram feature correlation information and unsatisfactory denoising effect in the existing speech enhancement methods,a speech enhancement method of Double Complex Convolution and Attention Aggregating Recurrent Network(DCCARN)was proposed.Firstly,a double complex convolutional network was established to encode the two-branch information of the spectrogram features after the short-time Fourier transform.Secondly,the codes in the two branches were used in the inter-and and intra-feature-block attention mechanisms respectively,and different speech feature information was re-labeled.Secondly,the long-term sequence information was processed by Long Short-Term Memory(LSTM)network,and the spectrogram features were restored and aggregated by two decoders.Finally,the target speech waveform was generated by short-time inverse Fourier transform to achieve the purpose of suppressing noise.Experiments were carried out on the public dataset VBD(Voice Bank+DMAND)and the noise added dataset TIMIT.The results show that compared with the phase-aware Deep Complex Convolution Recurrent Network(DCCRN),DCCARN has the Perceptual Evaluation of Speech Quality(PESQ)increased by 0.150 and 0.077 to 0.087 respectively.It is verified that the proposed method can capture the correlation information of spectrogram features more accurately,suppress noise more effectively,and improve speech intelligibility.

关 键 词:语音增强 注意力机制 复数卷积网络 编码 长短期记忆网络 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象