鲁棒的特定人语音分离算法  被引量:1

Robust target speaker speech separation algorithm

在线阅读下载全文

作  者:张新 付中华[1,2] Zhang Xin;Fu Zhonghua(School of Computer,Northwestern Polytechnical University,Xi’an 710129,China;Xi’an Iflytek Super-Brain Information Technology Co.,Ltd.,Xi’an 710076,China)

机构地区:[1]西北工业大学计算机学院,西安710129 [2]西安讯飞超脑信息科技有限公司,西安710076

出  处:《计算机应用研究》2022年第6期1749-1752,1759,共5页Application Research of Computers

基  金:科技创新2030-“新一代人工智能”重大项目(2018AAA0103100)。

摘  要:特定人语音分离算法是指从包含多种说话人同时讲话场景的混合语音中,通过一个特征向量的引导来分离出特定说话人的语音。特征向量的获取通常有两种方式,一种是使用一组自定义的正交独热(one-hot)向量,该方法可以在训练过程中达到更好的训练效果,但是无法处理训练过程中未见过的说话人;另一种方法是使用一个分类网络自适应地生成具有说话人特征的嵌入式向量(embedding),该做法会因为分类网络的误差而损失一部分训练效果,但是可以在集外说话人的样本上取得较好的泛化效果。为了解决在特定人语音分离算法用单独使用one-hot或embedding作为特征向量存在的不足之处,提出了一种鲁棒的特定人语音分离方法,通过在训练过程中交替地使用one-hot向量和embedding作为目标说话人的身份特征向量,将one-hot和embedding映射到公共空间中,可以在保证训练效果的同时,增强对集外说话人的泛化能力。实验结果表明,在使用了这种混合训练方法之后,对于测试集中的集外说话人分离效果上SDR提升超过了10 dB。The aim of target speaker’s speech separation is to extract one’s speech from a mixture speech consisted of multiple speakers,which is guided by an eigenvector.There are two ways to get the eigenvector,one is to use a one-hot vector,another is to adaptively generate an embedding containing the target speaker’s characteristic form a classification neural network.The advantage of using one-hot vector is that it can achieve perfect performance during the training process,while it cannot handle the unseen speakers beyond the training set.The advantage of using embedded vectors is that it loses part of the training effect,but it has a good generalization effect on unseen speakers.In order to solve the shortcomings of singlehandedly using one-hot vector or embedding vector in specific speaker speech separation algorithm,this paper proposed a hybrid training method.It used the one-hot vector and embedding vector alternately as the identity feature vector of target speakers.By mapping one-hot and embedding into public space,the proposed method could achieve a good generation effect while ensuring the training effect.The experimental results show that the proposed method achieves more than 10 dB SDR improvement on unseen spea-kers’speech separation.

关 键 词:语音分离 说话人识别 嵌入式向量 独热向量 

分 类 号:TN912.35[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象