检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:艾斯卡尔·肉孜[1] 王东[1] 李蓝天 郑方[1] 张晓东[2] 金磐石[2] AISIKAER Rouzi;WANG Dong;LI Lantian;ZHENG Fang;ZHANG Xiaodong;JIN Panshi(Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Center for Speech and Language Technologies, Research Institute of Information Technology, Department of Computer Science and Technology, Tsinghna University, Beijing 100084, China;Information Technology Management Department, China Construction Bank, Beijing 100000, China)
机构地区:[1]清华大学计算机科学与技术系、清华信息科学技术国家实验室技术创新和开发部语音和语言技术中心、信息技术研究院语音和语言技术中心,北京100084 [2]中国建设银行信息技术管理部,北京100000
出 处:《清华大学学报(自然科学版)》2018年第4期337-341,共5页Journal of Tsinghua University(Science and Technology)
基 金:国家自然科学基金资助项目(61271389,61371136);国家“九七三”重点基础研究发展计划(2013CB329302)
摘 要:语速变化导致说话人识别系统性能显著下降。该文提出一种分数域语速归一化方法来降低语速变化对说话人识别系统的影响。由不同语速语音数据组成参考集(全局和局部),对每一个登入说话人估计其对参考集中每一类参考语音的分数分布,局部参考集通过按相对语速划分全局参考集而获得。基于该文录制的语速数据库在GMM-UBM(Gaussian mixture model-universal background model)框架下对测试语音进行分数归一化,并通过训练数据扩展有效解决了数据系数问题,最终等错误率相对下降33.33%。研究结果表明:全局和局部归一化方法都有效减少了语速变化对说话人识别系统的影响。Speaking rate variations seriously degrade speaker recognition accuracy.This paper presents a normalization approach in the score domain that reduces the impact of speaking rate variations.The score distributions for each type of imposter in the cohort set(global and local sets which consist of speech utterances at different speaking rates)are computed against each enrolled speaker with the local cohort set obtained by splitting the utterances in the global cohort set according to the relative speaking rates.The scores for the test speech are normalized based on a self-recorded speaking rate database using a GMM-UBM(Gaussian mixture model-universal background model) framework with the data sparsity problem handled by augmenting the training data with a final relative EER(equal error rate)reduction of 33.33%.This study shows that global and local score normalization methods effectively reduce the impact of speaking rate variations on speaker recognition.
关 键 词:说话人识别 分数域 语速归一化 相对语速 GMM-UBM
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3