Voice conversion towards modeling dynamic characteristics using switching state space model

Voice conversion towards modeling dynamic characteristics using switching state space model

作　　者：XU Ning BAO JingYi LIU XiaoFeng JIANG AiMing TANG YiBing

机构地区：[1]College of Computer and Information Engineering, Hohai University [2]Ministry of Education Key Laboratory of Broadband Wireless Communication and Sensor Network Technology,Nanjing University of Posts and Telecommunications [3]School of Electronic Information and Electric Engineering of Changzhou Institute of Technology

出　　处：《Science China(Information Sciences)》2013年第12期233-247,共15页中国科学（信息科学）（英文版）

基　　金：supported in part by National Natural Science Foundation of China (Grant Nos. 11274092, 61271335);Fundamental Research Funds for the Central Universities (Grant Nos. 2011B11114, 2011B11314, 2012B07314, 2012B04014);National Natural Science Foundation for Young Scholars of China (Grant Nos. 61101158, 61201301, 31101643);Jiangsu Province Natural Science Foundation for Young Scholars of China (Grant No. BK20130238);Open Research Fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education (Grant No. NYKL201305)

摘　　要：In the literature of voice conversion （VC）, the method based on statistical Gaussian mixture model （GMM） serves as a benchmark. However, one of the inherent drawbacks of GMM is well-known as discontinuity problem, which is caused by transforming features on a frame-by-frame basis, thus ignoring the dynamics between adjacent frames and finally resulting in degraded quality of the converted speech. A variety of algorithms have been proposed to overcome this deficiency, among which the state space model （SSM） based method provides some promising results. In this paper, we proceed by presenting an enhanced version of the traditional SSM, namely, the switching SSM （SSSM）. This new structure is more flexible than the conventional one in that it allows using mixture of components to account for the rapid transitions between neighboring frames. Moreover, physical meaning of the model parameters of SSSM has been examined in depth, leading to efficient application-specific training and transforming procedures of VC. Experiments including both objective and subjective measurements were conducted to compare the performances of the conventional and the proposed SSM-based methods, which have convinced that obvious improvements in both aspects of similarity and quality can be obtained by SSSM.In the literature of voice conversion （VC）, the method based on statistical Gaussian mixture model （GMM） serves as a benchmark. However, one of the inherent drawbacks of GMM is well-known as discontinuity problem, which is caused by transforming features on a frame-by-frame basis, thus ignoring the dynamics between adjacent frames and finally resulting in degraded quality of the converted speech. A variety of algorithms have been proposed to overcome this deficiency, among which the state space model （SSM） based method provides some promising results. In this paper, we proceed by presenting an enhanced version of the traditional SSM, namely, the switching SSM （SSSM）. This new structure is more flexible than the conventional one in that it allows using mixture of components to account for the rapid transitions between neighboring frames. Moreover, physical meaning of the model parameters of SSSM has been examined in depth, leading to efficient application-specific training and transforming procedures of VC. Experiments including both objective and subjective measurements were conducted to compare the performances of the conventional and the proposed SSM-based methods, which have convinced that obvious improvements in both aspects of similarity and quality can be obtained by SSSM.

关键词：discontinuity problem dynamic characteristics Gaussian mixture model switching state spacemodel voice conversion

分类号：TN912.3[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Voice conversion towards modeling dynamic characteristics using switching state space model

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Voice conversion towards modeling dynamic characteristics using switching state space model

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索