基于语音合成的英语机器翻译机器人设计  被引量:1

Design of English Machine Translation Robot Based on Speech Synthesis

在线阅读下载全文

作  者:张冠萍[1] ZHANG Guanping(Xi’an Siyuan University,Xi’an 710038,China)

机构地区:[1]西安思源学院,西安710038

出  处:《自动化与仪器仪表》2023年第2期247-252,共6页Automation & Instrumentation

基  金:陕西省教育科学规划课题《幼儿园混龄教育的现状与效果研究》(SGH17H448)。

摘  要:针对当前英语翻译机器人的语音合成真实度低,导致人机交互效果不佳的问题,设计一个基于语音合成的英语翻译机器人,该机器人采用Bert TTS语音合成模型。在Seq2Seq结构基础上,加入注意力机制获取输入语音的梅尔声谱图;然后分别采用预训练的Bert和WaveNet网络架构作为编码器和语音生成器,学习生成英语语言的时域波形并通过Bert TTS模型合成语音。实验结果表明,在相同语音数据集中,本模型的合成语音自然度MOS和相似度MOS得分分别保持在378985分和4.12分左右,与真实语音间的误差较小。在500次和1 000次迭代过程中,本模型的MOS得分为4.56分和4.42分,均高于传统Tacotron2语音合成模型。由此可知,模型可提升英语语音合成真实度和自然度,语音合成质量显著提高。In view of the problem of low speech synthesis authenticity and poor human-computer interaction effect, an English translation robot based on speech synthesis is designed, which uses the Bert TTS speech synthesis model. Based on Seq2Seq structure, the attention mechanism is added to obtain the mel spectrogram of input speech;then the pre-trained Bert and WaveNet network architectures are used as encoders and speech generator to learn the time domain waveform of English language and synthesize speech through Bert TTS model. The experimental results show that the synthetic speech naturalness MOS and similarity MOS scores of the present model are kept around 378985 and 4.12 points in the same speech data set, with a small error between the real speech. During the 500 and 1000 iterations, the present model has an MOS score of 4.56 and 4.42, both higher than the traditional Tacotron2 speech synthesis model. Therefore, the paper model can improve the authenticity and nature of English speech synthesis, and the quality of speech synthesis can be significantly improved.

关 键 词:语音合成 英语翻译机器人 Seq2Seq WaveNet网络 Bert TTS模型 

分 类 号:TP392[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象