检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:伍雄 陈为真[1] WU Xiong;CHEN Weizhen(School of Electrical and Electronic Engineering of Wuhan Polytechnic University,Wuhan 430048,China)
机构地区:[1]武汉轻工大学电气与电于工程学院,湖北武汉430048
出 处:《长江信息通信》2023年第2期27-30,共4页Changjiang Information & Communications
基 金:湖北省教育厅科技项目(B2020061)。
摘 要:针对声纹识别任务在含噪背景下鲁棒性欠佳的问题,文章提出了一种基于TDNN改进的含噪声纹识别方法。该算法先提取说话人音频的梅尔频谱,利用自注意力机制(SE)使得网络更加聚焦于重要特征,引入残差连接(Res)修正梅尔频谱与输出层的特征损失信息,一定程度缓解神经网络退化的问题,使用多层特征聚合(MFA)密集连接输出特征,生成关注统计池的特征,最终生成一种强鲁棒性的声纹特征。在AISHELL-ASR0009含噪数据集进行实验表明:与Base-TDNN相比,i-TDNN算法的识别准确率提升16.63%,验证了此算法在含噪背景下的鲁棒性。To solve the problem that voice print recognition is not robust under background noise,this paper proposes an end-to-end Speaker Vector based on TDNN.Firstly,the algorithm extracts the Mahr spectrum of the speaker audio,and corrects the feature loss information of the Mahr spectrum and the output layer with the residuals connection(Res).Secondly,the seif-attention mechanism is introduced to make the network focus more on the important features and to some extent alleviate the problem of neural network degradation.Multi-layer feature aggregation(MFA)is used to intensively connect the output features.??Generate features that focus on the statistical pool,and finally generate a robust voicing vector.??Experiments on Aishell-1 dataset with noise show that compared with TDNN-base,this Speaker-Vector improves by 16.63%,thus verifying the effectiveness of this algorithm in the background of noise.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28