一种用于因果式语音增强的门控循环神经网络  被引量:3

A Gated Recurrent Neural Network for Causal Speech Enhancement

在线阅读下载全文

作  者:李江和 王玫 LI Jianghe;WANG Mei(College of Information Science and Engineering,Guilin University of Technology,Guilin,Guangxi 541006,China)

机构地区:[1]桂林理工大学信息科学与工程学院,广西桂林541006

出  处:《计算机工程》2022年第11期77-82,共6页Computer Engineering

基  金:国家自然科学基金(62071135);广西自然科学基金(2020GXNSFAA159004)。

摘  要:传统基于深度学习的语音增强方法为了提高网络对带噪语音的建模能力,通常采用非因果式的网络输入,由此导致了固定时延问题,使得语音增强系统实时性较差。提出一种用于因果式语音增强的门控循环神经网络CGRU,以解决实时语音增强系统中的固定时延问题并提高语音增强性能。为了更好地建模带噪语音信号的相关性,网络单元在计算当前时刻的输出时融合上一时刻的输入与输出。此外,采用线性门控机制来控制信息传输,以缓解网络训练过程中的过拟合问题。考虑到因果式语音增强系统对实时性要求较高,在CGRU网络中采用单门控的结构设计,以降低网络的结构复杂度,提高系统的实时性。实验结果表明,CGRU网络在增强后的语音感知质量、语音客观可懂度、分段信噪比指标上均优于GRU、SRNN、SRU等传统网络结构,在信噪比为0 dB的条件下,CGRU的平均语音感知质量和平均语音客观可懂度分别达到2.4和0.786。Traditional speech enhancement methods based on deep learning typically require noncausal network input to improve the modeling ability of the network for noisy speech.However,this input leads to fixed delay and poor realtime performance of the speech enhancement system. A gated recurrent neural network for causal speech enhancement called CGRU is proposed to solve the fixed delay problem in real-time speech enhancement systems and improve speech enhancement performance.The network unit fuses the input and output of the previous time when calculating the output of the current time to effectively model the correlation of noisy speech signals.In addition,the linear gating mechanism is used to control the information transmission to alleviate the over-fitting problem during the network training process.Because the causal speech enhancement system requires high real-time performance,the CGRU adopts a single-gate control structure design in its network structure design to simplify the network structure and improve the real-time performance of the system.The experimental results show that the CGRU network is superior to the Gated Recurrent Unit(GRU),Simple Recurrent Neural Network(SRNN),Simple Recurrent Unit(SRU),and other traditional network structures in terms of enhanced speech perception quality,speech objective intelligibility,Segmented Signal-to-Noise Ratio(SSNR),and other indicators.For an Signal-to-Noise Ratio(SNR)of 0 dB,the average speech perception quality and speech objective intelligibility of the CGRU reach 2.4 and 0.786,respectively.

关 键 词:门控循环神经网络 固定时延 因果式语音增强 语音质量 语音可懂度 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象