融合动态场景感知和注意力机制的声学回声消除算法  

Acoustic Echo Cancellation Algorithm Incorporating Dynamic Scene Perception and Attention Mechanisms

在线阅读下载全文

作  者:许春冬[1] 黄乔月 王磊 徐锦武 XU Chundong;HUANG Qiaoyue;WANG Lei;XU Jinwu(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000

出  处:《信号处理》2024年第2期396-405,共10页Journal of Signal Processing

基  金:国家自然科学基金(11864016,11704164);江西省科技厅重点研发计划一般项目(20202BBEL53006);江西理工大学研究生创新专项资金项目(XY2022-S167)。

摘  要:在实时语音频通话系统中,如何去除声学回声得到清晰语音是目前最受关注的难题之一。声学回声消除(Acoustic echo cancellation,AEC)技术旨在消除语音频通话系统中的声学回声,提高通话过程中的语音质量,给予用户良好的通话体验,但是传统回声消除系统存在去回声效果不明显、存在非线性回声残留以及无法实时处理回声等问题。因此,为解决上述存在问题,提出了一种动态场景感知模块(Dynamic scene perception module,DSPM)和全局注意力机制(Global attention mechanism,GAM)相结合的声学回声消除算法。该算法以卷积循环网络(Convolutional recurrent network,CRN)作为基线模型,提取语音信号的序列特征;首先,在其编码器中引入DSPM模块替换原因果卷积,根据场景动态分配卷积内核数量,加强模型的自适应性;其次,在编码器最后两层中分别引入GAM模块,放大空间通道间关系以及统筹全局交互,提升对语音信号特征的提取能力以及消除回声的性能;最后,通过将MSE损失函数和HuberLoss损失函数线性相加生成一种新的损失函数——MSE-HuberLoss,进一步提高模型的鲁棒性。实验结果表明,提出的GAM-DSPM-CRN模型的回声消除性能优秀,且获得较基线模型更加清晰的重构语音信号;在双端通话环境下,提出的GAM-DSPM-CRN模型声学回声消除算法较其他对比算法性能有较大提升;在Microsoft AEC Challenges数据集上,MOS、ERLE和STOI的得分分别达到了4.09、57.43和0.78。The removal of acoustic echoes to obtain clear speech is one of the most important challenges for real-time audio and video communication systems.Acoustic echo cancellation technology is designed to eliminate acoustic echoes from audio and video communication systems to improve the voice quality during calls and give users a good call experience.However,conventional echo cancellation systems suffer from ineffective de-echoing,non-linear echo residuals,and the inability to process echoes in real time.Therefore,an acoustic echo cancellation algorithm that combines a dynamic scene perception module(DSPM)and global attention mechanism(GAM)is proposed to solve the above-mentioned problems.A convolutional recurrent network(CRN)was used as the baseline model to extract the sequential features of the speech signals.First,the DSPM module was used to replace the causal convolution in its encoder,which dynamically allocated the number of convolutional kernels according to the scene and enhanced the adaptive nature of the model.Second,the GAM module was introduced in each of the last two layers of the encoder to amplify the spatial inter-channel relationships and coordinate global interactions to improve the extraction of speech signal features and the echo-cancellation performance.Finally,the robustness of the model was further improved by linearly adding the MSE and HuberLoss loss functions to generate a new loss function(MSE-HuberLoss).Experimental results showed that the proposed GAM-DSPM-CRN model had an excellent echo-cancellation performance and obtained a clearer reconstructed speech signal than the baseline model.The proposed GAM-DSPM-CRN model acoustic echo cancellation algorithm provided a greater performance improvement than other comparative algorithms in a two-ended call environment.On the Microsoft AEC Challenges dataset,the MOS,ERLE,and STOI scores reached 4.09,57.43,and 0.78,respectively.

关 键 词:声学回声消除 动态场景感知模块 全局注意力机制 卷积循环网络 联合损失函数 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象