深度复数轴向自注意力卷积循环网络的语音增强被引量：1

Speech Enhancement Based on Deep Complex Axial Self-attention Convolutional Recurrent Network

作　　者：曹洁[1,2] 王乔梁浩鹏王宸章李晓旭于泓[3] CAO Jie;WANG Qiao;LIANG Hao-Peng;WANG Chen-Zhang;LI Xiao-Xu;YU Hong(School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China;School of Information Engineering,Lanzhou City University,Lanzhou 730020,China;School of Information and Electrical Engineering,Ludong University,Yantai 264025,China)

机构地区：[1]兰州理工大学计算机与通信学院,兰州730050 [2]兰州城市学院信息工程学院,兰州730020 [3]鲁东大学信息与电气工程学院,烟台264025

出　　处：《计算机系统应用》2024年第4期60-68,共9页Computer Systems & Applications

基　　金：甘肃省重点研发计划(22YF7GA130)。

摘　　要：单通道语音增强任务中相位估计不准确会导致增强语音的质量较差,针对这一问题,提出了一种基于深度复数轴向自注意力卷积循环网络(deep complex axial self-attention convolutional recurrent network,DCACRN)的语音增强方法,在复数域同时实现了语音幅度信息和相位信息的增强.首先使用基于复数卷积网络的编码器从输入语音信号中提取复数表示的特征,并引入卷积跳连模块用以将特征映射到高维空间进行特征融合,加强信息间的交互和梯度的流动.然后设计了基于轴向自注意力机制的编码器-解码器结构,利用轴向自注意力机制来增强模型的时序建模能力和特征提取能力.最后通过解码器实现对语音信号的重构,同时利用混合损失函数优化网络模型,提升增强语音信号的质量.实验在公开数据集Valentini和DNS Challenge上进行,结果表明所提方法相对于其他模型在客观语音质量评估(perceptual evaluation of speech quality,PESQ)和短时客观可懂度(short-time objective intelligibility,STOI)两项指标上均有提升,在非混响数据集中,PESQ比DCTCRN(deep cosine transform convolutional recurrent network)提高了12.8%,比DCCRN(deep complex convolutional recurrent network)提高了3.9%,验证了该网络模型在语音增强任务中的有效性.Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech.To this end,this study proposes a speech enhancement method based on a deep complex axial selfattention convolutional recurrent network(DCACRN),which enhances speech amplitude information and phase information in the complex domain simultaneously.Firstly,a complex convolutional network-based encoder is employed to extract complex features from the input speech signal,and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion,which enhances the information interaction and the gradient flow.Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability.Finally,the reconstruction of the speech signals is realized by the decoder,while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals.Meanwhile,the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals.The experiments are conducted on the public datasets Valentini and DNS Challenge,and the results show that the proposed method improves both the perceptual evaluation of speech quality(PESQ)and short-time objective intelligibility(STOI)metrics compared to other models.In the non-reverberant dataset,PESQ is improved by 12.8%over DCTCRN and 3.9%over DCCRN,which validates the effectiveness of the proposed model in speech enhancement tasks.

关键词：单通道语音增强复数卷积循环网络卷积跳连轴向自注意力机制

分类号：TN912.35[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度复数轴向自注意力卷积循环网络的语音增强被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度复数轴向自注意力卷积循环网络的语音增强 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

深度复数轴向自注意力卷积循环网络的语音增强被引量：1