基于改进注意力机制的语音合成方法

Speech Synthesis Method Based on Improved Attention Mechanism

作　　者：陈若飞王景成[2] 李继超[1] 张彬彬[1] CHEN Ruo-fei;WANG Jing-cheng;LI Ji-chao;ZHANG Bin-bin(School of Electronic and Information Engineering,Xi'an Technological University,Xi'an Shaaxi 710021,China;School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240)

机构地区：[1]西安工业大学电子信息工程学院,陕西西安710021 [2]上海交通大学电子信息与电气工程学院,上海200240

出　　处：《计算机仿真》2025年第2期193-197,共5页Computer Simulation

基　　金：国家重点研发计划(2022YFE0123400);陕西省技术创新引导专项(2022QFY01-16)。

摘　　要：针对基于Local Sensitive Attention的语音合成存在长句合成鲁棒性差,对齐效果差,信息丢失等问题,提出三点改进措施:首先在使用深度分离卷积代替标准卷积,既减少模型参数又增加了卷积层的深度,从而提高了卷积操作的特征表达能力。其次缩小Energies数值的范围,将其与缩小因子α相乘可以将其范围缩小,这样可以避免在后面Softmax函数处理中出现数值过大的情况,提高模型稳定性和对于长序列的对齐能力。最后,使用两层BIGRU替换单层BILSTM,能更好提取上下文语义加强特征信息进而减少信息丢失。提高语音合成质量。实验结果表明,改进后的模型相比于原模型在长句语音合成时良好的对齐功能,有较好的鲁棒性且在提升原始模型的音色质量的前提下,训练损失降低了7%。Three improvement measures are proposed to address the issues of poor robustness in long sentence synthesis,poor alignment performance,and information loss in speech synthesis based on Local Sensitive Attention.Firstly,depth-separated convolution is used instead of standard convolution,which not only reduces model parameters but also increases The depth of the convolutional layer is increased,thereby improving the feature expression ability of the convolution operation.Secondly,reduce the range of the Energies value,and multiply it by the reduction factorαto reduce the range,which can avoid the occurrence of excessive values in the subsequent Softmax function processing,and improve the stability of the model and the alignment ability for long sequences.Finally,replacing the single-layer BILSTM with two layers of BICRU can better extract contextual semantic enhancement feature information and reduce information loss.Improve speech synthesis quality.The experimental results show that compared with the original model,the improved model has a good alignment function in long sentence speech synthesis,and has better robustness.On the premise of improving the timbre quality of the original model,the training loss is reduced by 7%.

关键词：注意力机制语音合成深度分离卷积长序列

分类号：TB183[一般工业技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进注意力机制的语音合成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进注意力机制的语音合成方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索