约束条件下的结构化高斯混合模型及非平行语料语音转换  被引量:3

Non-parallel Corpora Voice Conversion Based on Structured Gaussian Mixture Model Under Constraint Conditions

在线阅读下载全文

作  者:车滢霞 俞一彪[1] 

机构地区:[1]苏州大学电子信息学院,江苏苏州215006

出  处:《电子学报》2016年第9期2282-2288,共7页Acta Electronica Sinica

基  金:国家自然科学基金(No.61271360);江苏省自然科学基金(No.BK20131196)

摘  要:提出一种约束条件下的结构化高斯混合模型及非平行语料语音转换方法.从源与目标说话人的原始非平行语料中提取出少量相同音节,在结构化高斯混合模型的训练过程中,利用这些相同音节包含的语义信息及声学特征对应关系对K均值聚类中心进行约束,并在(Expectation Maximum,EM)迭代过程中对语音帧属于模型分量的后验概率进行修正,得到基于约束的结构化高斯混合模型(Structured Gaussian Mixture Model with Constraint condition,CSGMM).再利用全局声学结构(Acoustic Universal Structure,AUS)原理对源和目标说话人的约束结构化高斯混合模型的高斯分布进行匹配对准,推导出短时谱转换函数.主观和客观评价实验结果表明,使用该方法得到的转换后语音在谱失真,目标倾向性和语音质量等方面均优于传统的结构化模型语音转换方法,转换语音的平均谱失真仅为0.52,说话人正确识别率达到95.25%,目标语音倾向性指标ABX平均为0.82,性能更加接近于基于平行语料的语音转换方法.This paper proposes a structured Gaussian mixture model with constraint conditions( C-SGMM) for nonparallel corpora voice conversion. A small number of voice signals with the same syllables from the source and target nonparallel corpus are extracted as constraint conditions,then the correspondence between acoustic features of source and target corpus formed by these syllables are applied in the process of statistical acoustic model training. The constraint conditions are used to restrict the cluster centers of K-means clustering process,and they are also used in EMalgorithm to adjust the voice frame's posterior probability belonging to a Gaussian distribution component for model training. Then Gaussian distributions in source and target structured Gaussian mixture models are aligned using acoustic universal structure principle and the conversion function can be derived. Results of both subjective and objective experiments indicate that the conversion performance obtained by the proposed method are advanced to that of the traditional structured method in cepstrum distortion,target tendency and speech quality aspects. The average cepstrum distortion of converted speech is only 0. 52,the speaker recognition rate of the converted speech reaches 95. 25%,and the performance closer to the conventional parallel corpora GMMbased method is achieved.

关 键 词:语音转换 结构化高斯混合模型 非平行语料 约束条件 

分 类 号:TN912.33[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象