检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张冠萍[1] ZHANG Guanping(Xi’an Siyuan University,Xi’an 710038,China)
机构地区:[1]西安思源学院,西安710038
出 处:《自动化与仪器仪表》2023年第2期247-252,共6页Automation & Instrumentation
基 金:陕西省教育科学规划课题《幼儿园混龄教育的现状与效果研究》(SGH17H448)。
摘 要:针对当前英语翻译机器人的语音合成真实度低,导致人机交互效果不佳的问题,设计一个基于语音合成的英语翻译机器人,该机器人采用Bert TTS语音合成模型。在Seq2Seq结构基础上,加入注意力机制获取输入语音的梅尔声谱图;然后分别采用预训练的Bert和WaveNet网络架构作为编码器和语音生成器,学习生成英语语言的时域波形并通过Bert TTS模型合成语音。实验结果表明,在相同语音数据集中,本模型的合成语音自然度MOS和相似度MOS得分分别保持在378985分和4.12分左右,与真实语音间的误差较小。在500次和1 000次迭代过程中,本模型的MOS得分为4.56分和4.42分,均高于传统Tacotron2语音合成模型。由此可知,模型可提升英语语音合成真实度和自然度,语音合成质量显著提高。In view of the problem of low speech synthesis authenticity and poor human-computer interaction effect, an English translation robot based on speech synthesis is designed, which uses the Bert TTS speech synthesis model. Based on Seq2Seq structure, the attention mechanism is added to obtain the mel spectrogram of input speech;then the pre-trained Bert and WaveNet network architectures are used as encoders and speech generator to learn the time domain waveform of English language and synthesize speech through Bert TTS model. The experimental results show that the synthetic speech naturalness MOS and similarity MOS scores of the present model are kept around 378985 and 4.12 points in the same speech data set, with a small error between the real speech. During the 500 and 1000 iterations, the present model has an MOS score of 4.56 and 4.42, both higher than the traditional Tacotron2 speech synthesis model. Therefore, the paper model can improve the authenticity and nature of English speech synthesis, and the quality of speech synthesis can be significantly improved.
关 键 词:语音合成 英语翻译机器人 Seq2Seq WaveNet网络 Bert TTS模型
分 类 号:TP392[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.80