检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张新 付中华[1,2] Zhang Xin;Fu Zhonghua(School of Computer,Northwestern Polytechnical University,Xi’an 710129,China;Xi’an Iflytek Super-Brain Information Technology Co.,Ltd.,Xi’an 710076,China)
机构地区:[1]西北工业大学计算机学院,西安710129 [2]西安讯飞超脑信息科技有限公司,西安710076
出 处:《计算机应用研究》2022年第6期1749-1752,1759,共5页Application Research of Computers
基 金:科技创新2030-“新一代人工智能”重大项目(2018AAA0103100)。
摘 要:特定人语音分离算法是指从包含多种说话人同时讲话场景的混合语音中,通过一个特征向量的引导来分离出特定说话人的语音。特征向量的获取通常有两种方式,一种是使用一组自定义的正交独热(one-hot)向量,该方法可以在训练过程中达到更好的训练效果,但是无法处理训练过程中未见过的说话人;另一种方法是使用一个分类网络自适应地生成具有说话人特征的嵌入式向量(embedding),该做法会因为分类网络的误差而损失一部分训练效果,但是可以在集外说话人的样本上取得较好的泛化效果。为了解决在特定人语音分离算法用单独使用one-hot或embedding作为特征向量存在的不足之处,提出了一种鲁棒的特定人语音分离方法,通过在训练过程中交替地使用one-hot向量和embedding作为目标说话人的身份特征向量,将one-hot和embedding映射到公共空间中,可以在保证训练效果的同时,增强对集外说话人的泛化能力。实验结果表明,在使用了这种混合训练方法之后,对于测试集中的集外说话人分离效果上SDR提升超过了10 dB。The aim of target speaker’s speech separation is to extract one’s speech from a mixture speech consisted of multiple speakers,which is guided by an eigenvector.There are two ways to get the eigenvector,one is to use a one-hot vector,another is to adaptively generate an embedding containing the target speaker’s characteristic form a classification neural network.The advantage of using one-hot vector is that it can achieve perfect performance during the training process,while it cannot handle the unseen speakers beyond the training set.The advantage of using embedded vectors is that it loses part of the training effect,but it has a good generalization effect on unseen speakers.In order to solve the shortcomings of singlehandedly using one-hot vector or embedding vector in specific speaker speech separation algorithm,this paper proposed a hybrid training method.It used the one-hot vector and embedding vector alternately as the identity feature vector of target speakers.By mapping one-hot and embedding into public space,the proposed method could achieve a good generation effect while ensuring the training effect.The experimental results show that the proposed method achieves more than 10 dB SDR improvement on unseen spea-kers’speech separation.
分 类 号:TN912.35[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.147.211