基于语音质量自适应和类三元组思想的说话人确认方法

Speaker verification method based on speech quality adaptation and triplet-like idea

作　　者：王超姚姗姗 WANG Chao;YAO Shanshan(Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China)

机构地区：[1]山西大学大数据科学与产业研究院,太原030006

出　　处：《计算机应用》2024年第12期3899-3906,共8页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61906115);山西省基础研究计划项目(202303021221075)。

摘　　要：针对目前的说话人确认(SV)方法在复杂的测试场景或语音质量退化较大时性能下降严重的问题,提出一种基于语音质量自适应和类三元组思想的SV方法(QATM)。首先,利用说话人语音的特征范数关联语音质量;其次,通过判断语音质量好坏选取不同的损失函数,以调整不同质量语音样本的重要性,从而关注语音质量高的难样本,忽略语音质量低的难样本;最后,利用类三元组的思想同时改进AM-Softmax(Additive Margin Softmax)损失和AAM-Softmax(Additive Angular Margin Softmax)损失,旨在更关注困难的说话人样本,从而应对语音质量过差的难样本对模型的损害。实验结果表明,当训练集为VoxCeleb2开发集时,在Half-ResNet34、ResNet34和ECAPA-TDNN(Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network)网络架构中,所提方法与基于AAM-Softmax损失的方法相比,在VoxCeleb1-O测试集上的等错误率(EER)分别降低了6.41%、3.89%和7.27%;当训练集为Cn-Celeb. Train时,在Half-ResNet34网络架构中,所提方法与基于AAM-Softmax损失的方法相比,在评估集Cn-Celeb. Eval上的EER降低了5.25%。可见,所提方法在普通和复杂场景下的准确度均有所提高。Aiming at the problem that current Speaker Verification(SV)methods suffer from serious performance degradation in complex test scenarios or when the speech quality degradation is large,a speaker verification Method based on speech Quality Adaptation and Triplet-like idea(QATM)was proposed.Firstly,the feature norms of the speakers voice were utilized to correlate the speech quality,Then,through judging the quality of the speech samples,the importance of speech samples of different qualities was adjusted by different loss functions,so as to pay attention to the hard samples with high speech quality and ignore the hard samples with low speech quality.Finally,the triplet-like idea was utilized to simultaneously improve AM-Softmax(Additive Margin Softmax)loss and AAM-Softmax(Additive Angular Margin Softmax)loss,aiming to pay more attention to hard speaker samples to cope with the damage of hard samples with too poor speech quality to the model.Experimental results show that when the training set is VoxCeleb2 development set,the proposed method reduces the Equal Error Rate(EER)compared to the AAM-Softmax loss-based method on VoxCeleb1-O test set in network architecture Half-ResNet34,ResNet34,and ECAPA-TDNN(Emphasized Channel Attention,Propagation and Aggregation in Time Delay Neural Network)by 6.41%,3.89%,and 7.27%,respectively.When the training set is Cn-Celeb.Train,the proposed method reduces the EER by 5.25%on evaluation set Cn-Celeb.Eval compared to the AAM-Softmax loss-based method in network architecture Half-ResNet34.It can be seen that the accuracy of the proposed method is improved in both ordinary and complex scenarios.

关键词：说话人确认难样本语音质量自适应三元组思想

分类号：TN912.34[电子电信—通信与信息系统] TP391.42[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语音质量自适应和类三元组思想的说话人确认方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语音质量自适应和类三元组思想的说话人确认方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索