检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]清华大学计算机科学与技术系,北京100084
出 处:《清华大学学报(自然科学版)》2013年第11期1597-1600,1608,共5页Journal of Tsinghua University(Science and Technology)
基 金:国家自然科学基金资助项目(60928005;60931160443)
摘 要:在汉语语音合成中,音节内清音和浊音的时长是影响自然度的重要因素、并且与说话人关系较大的个性化特征之一。该文针对基于隐Markov模型(HMM)的汉语语音合成说话人自适应,提出了一种清浊音时长优化算法。将原始说话人训练语料的清音在音节中的相对时长特征根据语境特征进行决策树聚类,并进一步使用自适应算法将决策树中的特征值自适应到目标说话人的清音相对时长。在语音合成时,从该决策树得到目标说话人的清音相对时长参考值,合成语音的清浊音时长按照参考值进行调整。实验表明:该算法可以提高HMM汉语语音合成中说话人自适应的时长预测准确度,有效地提高说话人自适应的相似度和合成语音的自然度。In Mandarin text-to-speech (TTS), the duration of unvoiced and voiced phonemes in a syllable is a very important factor related to the naturalness of the synthesized speech. This personalized feature is also strongly related to the speaker. This paper presents an unvoiced/voiced duration optimization approach for speaker adaptation in hidden Markov model (HMM) based Mandarin TTS. The relative duration of the unvoiced part of syllables in the source speaker corpus is clustered with context features. A decision tree is then based on the target speaker characteristic using the relative duration of the unvoiced part in the adaptation data. The sound synthesis then uses a reference relative duration of the unvoiced part for the target speaker generated from this decision tree, with the durations of the unvoiced and voiced parts in the synthesized speech adjusted accordingly. Tests show that this approach improves the accuracy of duration prediction in the speaker adaptation of HMM-based Mandarin TTS and effectively improves the similarity of speaker adaptation and the naturalness of the synthesized speech.
分 类 号:TN912.33[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30