多模型融合的VoxSRC22说话人日志系统

Multi-Model Fusion VoxSRC22 Speaker Diarization System

作　　者：杜雨轩周若华 DU Yuxuan;ZHOU Ruohua(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102616,China)

机构地区：[1]北京建筑大学电气与信息工程学院,北京102616

出　　处：《计算机工程与应用》2024年第10期164-172,共9页Computer Engineering and Applications

摘　　要：为有效解决“谁在什么时候说话”的问题,提出一种说话人日志方法。该方法由六个模块组成,包括语音活动检测(voice activity detection,VAD)、语音增强、说话人嵌入提取器、说话人聚类、重叠语音检测(overlapping speech detection,OSD)和结果融合。利用语音增强技术可以改善语音活动检测的性能。有效地结合不同的说话人嵌入提取器和聚类算法可以进一步降低系统错误率。在系统融合后处理重叠语音展示了最佳结果。实验结果表明,最佳系统的性能相对基线提升了72%,并在VoxCeleb说话人识别挑战赛(VoxCeleb speaker recognition challenge,VoxSRC)2022评估集上分别实现了5.48%的说话人日志错误率(diarization error rate,DER)和32.10%的杰卡德错误率(Jaccard error rate,JER),排名第四。In order to effectively address the problem of speaker diarization,a novel speaker diarization method is pro-posed.The proposed method consists of six modules,including voice activity detection(VAD),speech enhancement,speaker embedding extractor,speaker clustering,overlapping speech detection(OSD),and result fusion.The application of speech enhancement techniques can improve the performance of voice activity detection.The effective combination of different speaker embedding extractors and clustering algorithms can further reduce speaker diarization error rate.The best performance is achieved by processing the overlapping speech after system fusion.Experimental results show that the performance of the proposed system outperforms the baseline by 72%,achieves a speaker diarization error rate(DER)of 5.48%and a Jaccard error rate(JER)of 32.10%on the VoxCeleb speaker recognition challenge(VoxSRC)2022 evaluation set,ranking fourth.

关键词：说话人日志语音活动检测声纹嵌入说话人聚类结果融合

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模型融合的VoxSRC22说话人日志系统

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模型融合的VoxSRC22说话人日志系统

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索