检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曹洁[1,2] 王乔 梁浩鹏 王宸章 李晓旭 于泓[3] CAO Jie;WANG Qiao;LIANG Hao-Peng;WANG Chen-Zhang;LI Xiao-Xu;YU Hong(School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China;School of Information Engineering,Lanzhou City University,Lanzhou 730020,China;School of Information and Electrical Engineering,Ludong University,Yantai 264025,China)
机构地区:[1]兰州理工大学计算机与通信学院,兰州730050 [2]兰州城市学院信息工程学院,兰州730020 [3]鲁东大学信息与电气工程学院,烟台264025
出 处:《计算机系统应用》2024年第4期60-68,共9页Computer Systems & Applications
基 金:甘肃省重点研发计划(22YF7GA130)。
摘 要:单通道语音增强任务中相位估计不准确会导致增强语音的质量较差,针对这一问题,提出了一种基于深度复数轴向自注意力卷积循环网络(deep complex axial self-attention convolutional recurrent network,DCACRN)的语音增强方法,在复数域同时实现了语音幅度信息和相位信息的增强.首先使用基于复数卷积网络的编码器从输入语音信号中提取复数表示的特征,并引入卷积跳连模块用以将特征映射到高维空间进行特征融合,加强信息间的交互和梯度的流动.然后设计了基于轴向自注意力机制的编码器-解码器结构,利用轴向自注意力机制来增强模型的时序建模能力和特征提取能力.最后通过解码器实现对语音信号的重构,同时利用混合损失函数优化网络模型,提升增强语音信号的质量.实验在公开数据集Valentini和DNS Challenge上进行,结果表明所提方法相对于其他模型在客观语音质量评估(perceptual evaluation of speech quality,PESQ)和短时客观可懂度(short-time objective intelligibility,STOI)两项指标上均有提升,在非混响数据集中,PESQ比DCTCRN(deep cosine transform convolutional recurrent network)提高了12.8%,比DCCRN(deep complex convolutional recurrent network)提高了3.9%,验证了该网络模型在语音增强任务中的有效性.Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech.To this end,this study proposes a speech enhancement method based on a deep complex axial selfattention convolutional recurrent network(DCACRN),which enhances speech amplitude information and phase information in the complex domain simultaneously.Firstly,a complex convolutional network-based encoder is employed to extract complex features from the input speech signal,and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion,which enhances the information interaction and the gradient flow.Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability.Finally,the reconstruction of the speech signals is realized by the decoder,while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals.Meanwhile,the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals.The experiments are conducted on the public datasets Valentini and DNS Challenge,and the results show that the proposed method improves both the perceptual evaluation of speech quality(PESQ)and short-time objective intelligibility(STOI)metrics compared to other models.In the non-reverberant dataset,PESQ is improved by 12.8%over DCTCRN and 3.9%over DCCRN,which validates the effectiveness of the proposed model in speech enhancement tasks.
关 键 词:单通道语音增强 复数卷积循环网络 卷积跳连 轴向自注意力机制
分 类 号:TN912.35[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222