检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]云南大学,信息学院,云南 昆明
出 处:《计算机科学与应用》2023年第1期126-135,共10页Computer Science and Application
摘 要:由于印尼语高质量语料数据库的稀缺,该语种多说话人语音合成系统性能仍有待提升。因此以缓解低资源对多说话人语音合成性能的影响为目的,研究并实现了基于GST-Tacotron2模型框架的印尼语端到端语音合成系统。选用8.5小时的单说话人印尼语数据训练的合成系统,合成语音的MOS评分达4.11。在此基础上,设计多说话人印尼语语音合成系统,着重探索了在仅利用其他印尼语说话人少量语音数据进行混合训练时,采用说话人编码方法对多说话人合成自然度的影响。实验结果表明,利用合计14.5小时多说话人语音数据训练的合成模型,主位说话人合成语音的MOS评分到达了4.12,梅尔倒谱失真比单说话人最优模型降低了7.2%。其他说话人合成语音的MOS评分均大于3.60,验证了所提方法的有效性。Due to the scarcity of high-quality Indonesian corpus databases, the performance of Indonesian multi-speaker speech synthesis systems still needs to be improved. Therefore, in order to alleviate the impact of low-resources on the performance of multi-speaker speech synthesis, an end-to-end speech synthesis system in Indonesian based on the GST-Tacotron2 model framework is studied and implemented. A synthesis system trained on 8.5 hours of single-speaker Indonesian data achieves a MOS (Mean Opinion Score) score of 4.11 for synthesized speech. On this basis, a multi-speaker Indonesian speech synthesis system is designed, and the influence of the speaker coding method on the naturalness of multi-speaker synthesis is emphatically explored when only a small amount of speech data of other Indonesian speakers is used for hybrid training. The experimental results show that the MOS score of the synthesized speech of the main speaker reaches 4.12 using the synthesis model trained with a total of 14.5 hours of multi-speaker speech data. The MCD is 7.2% lower than the single-speaker optimal model. The MOS scores of the synthesized speech of other speakers are all greater than 3.60, which verifies the effectiveness of the proposed method.
分 类 号:TN912.33[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7