汉语语音合成中说话人自适应的时长优化  被引量:1

Duration optimization of speaker adaptation in Mandarin TTS

在线阅读下载全文

作  者:徐英进[1] 贾珈[1] 蔡莲红[1] 

机构地区:[1]清华大学计算机科学与技术系,北京100084

出  处:《清华大学学报(自然科学版)》2013年第11期1597-1600,1608,共5页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金资助项目(60928005;60931160443)

摘  要:在汉语语音合成中,音节内清音和浊音的时长是影响自然度的重要因素、并且与说话人关系较大的个性化特征之一。该文针对基于隐Markov模型(HMM)的汉语语音合成说话人自适应,提出了一种清浊音时长优化算法。将原始说话人训练语料的清音在音节中的相对时长特征根据语境特征进行决策树聚类,并进一步使用自适应算法将决策树中的特征值自适应到目标说话人的清音相对时长。在语音合成时,从该决策树得到目标说话人的清音相对时长参考值,合成语音的清浊音时长按照参考值进行调整。实验表明:该算法可以提高HMM汉语语音合成中说话人自适应的时长预测准确度,有效地提高说话人自适应的相似度和合成语音的自然度。In Mandarin text-to-speech (TTS), the duration of unvoiced and voiced phonemes in a syllable is a very important factor related to the naturalness of the synthesized speech. This personalized feature is also strongly related to the speaker. This paper presents an unvoiced/voiced duration optimization approach for speaker adaptation in hidden Markov model (HMM) based Mandarin TTS. The relative duration of the unvoiced part of syllables in the source speaker corpus is clustered with context features. A decision tree is then based on the target speaker characteristic using the relative duration of the unvoiced part in the adaptation data. The sound synthesis then uses a reference relative duration of the unvoiced part for the target speaker generated from this decision tree, with the durations of the unvoiced and voiced parts in the synthesized speech adjusted accordingly. Tests show that this approach improves the accuracy of duration prediction in the speaker adaptation of HMM-based Mandarin TTS and effectively improves the similarity of speaker adaptation and the naturalness of the synthesized speech.

关 键 词:汉语语音合成 说话人自适应 时长优化 清浊音 

分 类 号:TN912.33[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象