检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王超 姚姗姗 WANG Chao;YAO Shanshan(Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China)
机构地区:[1]山西大学大数据科学与产业研究院,太原030006
出 处:《计算机应用》2024年第12期3899-3906,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61906115);山西省基础研究计划项目(202303021221075)。
摘 要:针对目前的说话人确认(SV)方法在复杂的测试场景或语音质量退化较大时性能下降严重的问题,提出一种基于语音质量自适应和类三元组思想的SV方法(QATM)。首先,利用说话人语音的特征范数关联语音质量;其次,通过判断语音质量好坏选取不同的损失函数,以调整不同质量语音样本的重要性,从而关注语音质量高的难样本,忽略语音质量低的难样本;最后,利用类三元组的思想同时改进AM-Softmax(Additive Margin Softmax)损失和AAM-Softmax(Additive Angular Margin Softmax)损失,旨在更关注困难的说话人样本,从而应对语音质量过差的难样本对模型的损害。实验结果表明,当训练集为VoxCeleb2开发集时,在Half-ResNet34、ResNet34和ECAPA-TDNN(Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network)网络架构中,所提方法与基于AAM-Softmax损失的方法相比,在VoxCeleb1-O测试集上的等错误率(EER)分别降低了6.41%、3.89%和7.27%;当训练集为Cn-Celeb. Train时,在Half-ResNet34网络架构中,所提方法与基于AAM-Softmax损失的方法相比,在评估集Cn-Celeb. Eval上的EER降低了5.25%。可见,所提方法在普通和复杂场景下的准确度均有所提高。Aiming at the problem that current Speaker Verification(SV)methods suffer from serious performance degradation in complex test scenarios or when the speech quality degradation is large,a speaker verification Method based on speech Quality Adaptation and Triplet-like idea(QATM)was proposed.Firstly,the feature norms of the speakers voice were utilized to correlate the speech quality,Then,through judging the quality of the speech samples,the importance of speech samples of different qualities was adjusted by different loss functions,so as to pay attention to the hard samples with high speech quality and ignore the hard samples with low speech quality.Finally,the triplet-like idea was utilized to simultaneously improve AM-Softmax(Additive Margin Softmax)loss and AAM-Softmax(Additive Angular Margin Softmax)loss,aiming to pay more attention to hard speaker samples to cope with the damage of hard samples with too poor speech quality to the model.Experimental results show that when the training set is VoxCeleb2 development set,the proposed method reduces the Equal Error Rate(EER)compared to the AAM-Softmax loss-based method on VoxCeleb1-O test set in network architecture Half-ResNet34,ResNet34,and ECAPA-TDNN(Emphasized Channel Attention,Propagation and Aggregation in Time Delay Neural Network)by 6.41%,3.89%,and 7.27%,respectively.When the training set is Cn-Celeb.Train,the proposed method reduces the EER by 5.25%on evaluation set Cn-Celeb.Eval compared to the AAM-Softmax loss-based method in network architecture Half-ResNet34.It can be seen that the accuracy of the proposed method is improved in both ordinary and complex scenarios.
关 键 词:说话人确认 难样本 语音质量 自适应 三元组思想
分 类 号:TN912.34[电子电信—通信与信息系统] TP391.42[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200