基于Tacotron模型和韵律修正的情感语音合成方法  被引量:2

Expressive Speech Synthesis Method Based on Tacotron Model and Prosodic Correction

在线阅读下载全文

作  者:张昕 胡航烨 曹欣怡 王蔚[1] ZHANG Xin;HU Hangye;CAO Xinyi;WANG Wei(College of Education Science,Nanjing Normal University,Nanjing 210097,China)

机构地区:[1]南京师范大学教育科学学院,南京210097

出  处:《数据采集与处理》2022年第4期909-916,共8页Journal of Data Acquisition and Processing

基  金:国家哲学社会科学基金(BCA150054)。

摘  要:语音合成技术日趋成熟,为了提高合成情感语音的质量,提出了一种端到端情感语音合成与韵律修正相结合的方法。在Tacotron模型合成的情感语音基础上,进行韵律参数的修改,提高合成系统的情感表达力。首先使用大型中性语料库训练Tacotron模型,再使用小型情感语料库训练,合成出具有情感的语音。然后采用Praat声学分析工具对语料库中的情感语音韵律特征进行分析并总结不同情感状态下的参数规律,最后借助该规律,对Tacotron合成的相应情感语音的基频、时长和能量进行修正,使情感表达更为精确。客观情感识别实验和主观评价的结果表明,该方法能够合成较为自然且表现力更加丰富的情感语音。Speech synthesis technology is becoming more mature.In order to improve the quality of synthetic emotional speech,this study proposes a method combining end-to-end emotional speech synthesis with prosodic correction.Based on the Tacotron model,the prosodic parameters are modified to improve the emotion expression power of the synthetic system.Tacotron model is first trained with a large neutral corpus,and then a small emotional corpus is used to train and synthesize emotional speech.Then the Praat acoustic analysis tool is used to analyze the prosodic features of emotional speech in the corpus and summarize the parameters of different emotional states.Finally,with the help of this rule,the fundamental frequency,duration and energy of the corresponding emotional speech synthesized by Tacotron are modified to make the emotional expression more accurate.The results of objective emotion recognition experiment and subjective evaluation show that this method can synthesize more natural and expressive emotional speech.

关 键 词:语音合成 端到端合成 韵律修正 情感语音 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象