基于音素级韵律建模的自回归零样本语音合成  

Autoregressive Zero-shot Speech Synthesis Based on Phoneme-level Prosody Modeling

在线阅读下载全文

作  者:岳焕景 王嘉玮 杨敬钰 YUE Huanjing;WANG Jiawei;YANG Jingyu(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China)

机构地区:[1]天津大学电气自动化与信息工程学院,天津300072

出  处:《湖南大学学报(自然科学版)》2025年第4期114-123,共10页Journal of Hunan University:Natural Sciences

基  金:国家自然科学基金资助项目(61672378)。

摘  要:为了提升合成韵律的自然度和稳定性,提出了基于音素级韵律建模的自回归语音合成模型.该模型从词级别停顿和音素时长两方面改进韵律建模.为了提升词级别停顿的多样性和准确性,在文本前端提出了停顿预测模块.该模块基于原始文本来预测多类停顿标签,从而为语音合成提供停顿时长建模的准确参考.为了提升音素时长的自然度,提出了时长预测模块.该模块预测每个音素的混合高斯分布,并通过随机采样来获得多样化的音素时长.为了提升自回归模型中的音素时长建模的稳定性,提出了注意力判别模块.该模块应用于自回归的每个时间步中,并通过注意力和判断机制来避免对齐紊乱现象.实验结果表明,所提三种模块可有效提升韵律建模的自然度和稳定性,从而提升语音合成的效果.To improve the naturalness and robustness of synthesized prosody,a autoregressive speech synthesis model based on phoneme-level prosody modeling is proposed.This model enhances prosody modeling from two aspects:inter-word pauses and phoneme durations.To enhance the diversity and accuracy of inter-word pauses,a pause prediction module is proposed at the text frontend.This module predicts multiple pause labels based on the original text,thereby providing accurate references for pause duration modeling in speech synthesis.To enhance the naturalness of phoneme durations,a duration prediction module is proposed.This module predicts a mixture Gaussian distribution for each phoneme and obtains diversified phoneme durations through random sampling.To stabilize phoneme duration modeling in the autoregressive model,an attention-based discrimination module is proposed.This module is applied at each time step of the autoregressive process and avoids alignment disorder through attention and discrimination mechanisms.Experimental results demonstrate that the three proposed modules effectively enhance the naturalness and robustness of prosody modeling,thereby improving the quality of speech synthesis.

关 键 词:语音合成 韵律建模 停顿预测 

分 类 号:TP37[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象