i-TDNN:一种基于TDNN改进的含噪声纹识别方法  被引量:2

i-TDNN:An improved noise speaker recognition method based on TDNN

在线阅读下载全文

作  者:伍雄 陈为真[1] WU Xiong;CHEN Weizhen(School of Electrical and Electronic Engineering of Wuhan Polytechnic University,Wuhan 430048,China)

机构地区:[1]武汉轻工大学电气与电于工程学院,湖北武汉430048

出  处:《长江信息通信》2023年第2期27-30,共4页Changjiang Information & Communications

基  金:湖北省教育厅科技项目(B2020061)。

摘  要:针对声纹识别任务在含噪背景下鲁棒性欠佳的问题,文章提出了一种基于TDNN改进的含噪声纹识别方法。该算法先提取说话人音频的梅尔频谱,利用自注意力机制(SE)使得网络更加聚焦于重要特征,引入残差连接(Res)修正梅尔频谱与输出层的特征损失信息,一定程度缓解神经网络退化的问题,使用多层特征聚合(MFA)密集连接输出特征,生成关注统计池的特征,最终生成一种强鲁棒性的声纹特征。在AISHELL-ASR0009含噪数据集进行实验表明:与Base-TDNN相比,i-TDNN算法的识别准确率提升16.63%,验证了此算法在含噪背景下的鲁棒性。To solve the problem that voice print recognition is not robust under background noise,this paper proposes an end-to-end Speaker Vector based on TDNN.Firstly,the algorithm extracts the Mahr spectrum of the speaker audio,and corrects the fea­ture loss information of the Mahr spectrum and the output layer with the residuals connection(Res).Secondly,the seif-attention mechanism is introduced to make the network focus more on the important features and to some extent alleviate the problem of neural network degradation.Multi-layer feature aggregation(MFA)is used to intensively connect the output features.??Gener­ate features that focus on the statistical pool,and finally generate a robust voicing vector.??Experiments on Aishell-1 dataset with noise show that compared with TDNN-base,this Speaker-Vector improves by 16.63%,thus verifying the effectiveness of this algorithm in the background of noise.

关 键 词:声纹识别 时延神经网络 自注意力机制 残差连接 多层特征聚合 

分 类 号:TP912[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象