生成式与对比式耦合的声纹识别自监督预训练方法  被引量:1

Coupled Generative and Contrastive Self-supervised Pre-training Method for Voiceprint Recognition

在线阅读下载全文

作  者:蒋世炜 钱宇华 原之安 梁新彦 JIANG Shiwei;QIAN Yuhua;YUAN Zhian;LIANG Xinyan(Research Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China;Engineering Research Center for Machine Vision and Data Mining of Shanxi Province,Taiyuan 030006,China;School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China)

机构地区:[1]山西大学大数据科学与产业研究院,太原030006 [2]山西省机器视觉与数据挖掘中心,太原030006 [3]山西大学计算机与信息技术学院,太原030006

出  处:《小型微型计算机系统》2024年第8期1847-1853,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金重点项目(62136005)资助;国家重点研发计划项目(2021ZD0112400)资助;山西省科技重大专项“揭榜挂帅”项目(202201020101006)资助;山西省青年科学基金项目(20210302124556)资助.

摘  要:当前,自监督学习技术已成为缓解声纹识别任务有标签训练数据不足问题的主要手段.然而,相关研究目前仅注重学习样本的全局特征,忽略了对样本局部特征的学习.为了解决该问题,本文提出了一种耦合生成式建模和对比式建模的声纹识别自监督框架.该框架不仅保留了对比式建模对所学全局特征的约束,同时引入了生成式建模对所学局部特征的约束,使得特征提取模型学习到更具判别性的特征.基于此框架,本文提出了一种新的声纹识别自监督学习方法DINO-MFM.实验结果表明,DINO-MFM比其他自监督方法具有更好的性能表现,相较于对比式方法DINO,等错率下降了6.4%.Currently,self-supervised learning techniques have become the main approach to alleviate the problem of insufficient labeled training data in speaker recognition tasks.However,related studies currently only focus on learning the global features of samples,ignoring the learning of local features.To solve this problem,this paper proposes a self-supervised framework for speaker recognition that combines generative modeling and contrastive modeling.This framework not only retains the constraints of contrastive modeling on the learned global features,but also introduces constraints on the learned local features through generative modeling,enabling the feature extraction model to learn more discriminative features.Based on this framework,a new self-supervised learning method for speaker recognition,DINO-MFM,is proposed.Experimental results show that DINO-MFM outperforms other self-supervised methods,with a 6.4%relative decrease in equal error rate compared to the contrastive method DINO.

关 键 词:声纹识别 说话人识别 自监督学习 生成式学习 对比式学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象