检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜雨轩 周若华 DU Yuxuan;ZHOU Ruohua(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102616,China)
机构地区:[1]北京建筑大学电气与信息工程学院,北京102616
出 处:《计算机工程与应用》2024年第10期164-172,共9页Computer Engineering and Applications
摘 要:为有效解决“谁在什么时候说话”的问题,提出一种说话人日志方法。该方法由六个模块组成,包括语音活动检测(voice activity detection,VAD)、语音增强、说话人嵌入提取器、说话人聚类、重叠语音检测(overlapping speech detection,OSD)和结果融合。利用语音增强技术可以改善语音活动检测的性能。有效地结合不同的说话人嵌入提取器和聚类算法可以进一步降低系统错误率。在系统融合后处理重叠语音展示了最佳结果。实验结果表明,最佳系统的性能相对基线提升了72%,并在VoxCeleb说话人识别挑战赛(VoxCeleb speaker recognition challenge,VoxSRC)2022评估集上分别实现了5.48%的说话人日志错误率(diarization error rate,DER)和32.10%的杰卡德错误率(Jaccard error rate,JER),排名第四。In order to effectively address the problem of speaker diarization,a novel speaker diarization method is pro-posed.The proposed method consists of six modules,including voice activity detection(VAD),speech enhancement,speaker embedding extractor,speaker clustering,overlapping speech detection(OSD),and result fusion.The application of speech enhancement techniques can improve the performance of voice activity detection.The effective combination of different speaker embedding extractors and clustering algorithms can further reduce speaker diarization error rate.The best performance is achieved by processing the overlapping speech after system fusion.Experimental results show that the performance of the proposed system outperforms the baseline by 72%,achieves a speaker diarization error rate(DER)of 5.48%and a Jaccard error rate(JER)of 32.10%on the VoxCeleb speaker recognition challenge(VoxSRC)2022 evaluation set,ranking fourth.
关 键 词:说话人日志 语音活动检测 声纹嵌入 说话人聚类 结果融合
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:52.15.220.116