结合轻量卷积的非自回归语音合成方法  

Non-autoregressive speech synthesis method combined with lightweight convolution

在线阅读下载全文

作  者:钟巧霞 曾碧[1] 林镇涛 林伟[1] ZHONG Qiao-xia;ZENG Bi;LIN Zhen-tao;LIN Wei(School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)

机构地区:[1]广东工业大学计算机学院,广东广州510006

出  处:《计算机工程与设计》2024年第4期1166-1172,共7页Computer Engineering and Design

基  金:国家自然科学基金项目(62172111);广东省自然科学基金项目(2019A1515011056);顺德区核心技术攻关基金项目(2130218003002)。

摘  要:对如何有效捕捉音素之间的关联及如何合成韵律丰富的音频进行研究,提出一种结合轻量卷积的非自回归语音合成模型LCTTS。引入轻量卷积建立起音素之间的联系,解决发音出错问题。通过添加音高和能量预测器预测生成语音的韵律,解决音频韵律缺乏问题。训练模型获取梅尔频谱,结合预先训练好的声码器转化为音频。实验结果表明,提出的LCTTS模型优于先前提出的SpeedySpeech模型,在Emotional Speech Database数据集上平均意见得分获得2.8%的提升,梅尔倒谱失真测度下降0.15。An effective way was investigated to capture the relationship between phonemes and further synthesize prosody-rich audio.A non-autoregressive speech synthesis model LCTTS was proposed combined with lightweight convolution that first resolved the problem of pronunciation errors by introducing lightweight convolution to establish the connection between phonemes.The lack of prosody in the audio was addressed by adding pitch and energy predictors to predict the prosody of the generated speech.The model was trained to obtain the Mel spectrum,and the result with the pre-trained vocoder was further combined to convert it into audio.Experimental results show that the proposed LCTTS model is superior to the previously SpeedySpeech model.The mean opinion score on the Emotional Speech Database dataset is improved by 2.8%,and the Mel cepstrum distortion measure is decreased by 0.15.

关 键 词:语音合成 轻量级卷积 韵律合成 梅尔频谱生成 非自回归方法 深度学习 自然语言处理 

分 类 号:TP912.33[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象