检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张池 王忠[1] 姜添豪 谢康民 ZHANG Chi;WANG Zhong;JIANG Tianhao;XIE Kangmin(College of Electrical Engineering,Sichuan University,Chengdu 610065,Sichuan,China;Wenzhou Power Supply Company,State Grid Zhejiang Electric Power Co.,Ltd.,Wenzhou 325029,Zhejiang,China)
机构地区:[1]四川大学电气工程学院,四川成都610065 [2]国网浙江省电力有限公司温州供电公司,浙江温州325029
出 处:《计算机工程》2024年第4期68-77,共10页Computer Engineering
基 金:四川省科技厅支撑计划(2015FZ061);四川省教育厅2018年度自然科学重点科研项目(18ZA0307)。
摘 要:针对受干扰语音的频域增强问题,提出一种基于并行多注意力机制和编解码结构的语音增强网络(PMAN)。网络输入经过短时傅里叶变换(STFT)的语音频域特征,包含振幅谱和复数谱,编码器使用密集卷积模块对输入数据信息进行整合,中间层的并行多注意力模块学习频域的局部和全局信息,并融合局部块注意力(LPA)机制捕捉语音频域二维(2D)结构,实现干净语音与干扰因素的2D层面分离。解码器将学习到的信息进行整合,分别生成振幅掩模和复数频谱,根据加权求和生成最终的语音复数频谱,使用时域与频域联合损失函数实现相位信息的融合。在VoicеBank+DEMAND语音数据集上的实验结果表明,与基于两阶段变换器的时域语音增强神经网络(TSTNN)相比,经过PMAN增强后语音的客观语音质量评价(PESQ)、短时客观可懂度(STOI)、分段信噪比(SSNR)指标值分别提升10.8%、1.1%、11.8%,具有较好的语音增强效果。Regarding the issue of the frequency-domain enhancement of speech affected by interference,a speech enhancement network based on a parallel multi-attention mechanism and an encoding and decoding structure,known as PMAN,is proposed.The network uses speech frequency-domain features obtained through a Short-Time Fourier Transform(STFT),including amplitude and complex spectra.The encoder integrates input data using dense convolutional modules.The parallel multi-attention module of the intermediate layer learns both local and global information in the frequency-domain and incorporates a Local Patch Attention(LPA)mechanism to capture the Two-Dimensional(2D)structure of the speech frequency-domain,achieving separation between clean speech and interference factors in the 2D space.The decoder integrates the learned information and generates amplitude masks and complex spectra separately.The final speech complex spectrum is obtained via weighted summation,and a joint time-and frequency-domain loss function is used to fuse the phase information.Experimental results on the VoiceBank+DEMAND speech dataset demonstrate that PMAN achieves better speech enhancement performance than a time-domain speech enhancement Neural Network based on a Two-Stage Transformer(TSTNN),with improvements of 10.8%in Perceptual Evaluation of Speech Quality(PESQ),1.1%in Short-Time Objective Intelligibility(STOI),and 11.8%in Segmental Signal-to-Noise Ratio(SSNR).
关 键 词:语音增强 频域 多注意力机制 Transformer网络 并行模块
分 类 号:TN912.35[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.40