检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈若飞 王景成[2] 李继超[1] 张彬彬[1] CHEN Ruo-fei;WANG Jing-cheng;LI Ji-chao;ZHANG Bin-bin(School of Electronic and Information Engineering,Xi'an Technological University,Xi'an Shaaxi 710021,China;School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240)
机构地区:[1]西安工业大学电子信息工程学院,陕西西安710021 [2]上海交通大学电子信息与电气工程学院,上海200240
出 处:《计算机仿真》2025年第2期193-197,共5页Computer Simulation
基 金:国家重点研发计划(2022YFE0123400);陕西省技术创新引导专项(2022QFY01-16)。
摘 要:针对基于Local Sensitive Attention的语音合成存在长句合成鲁棒性差,对齐效果差,信息丢失等问题,提出三点改进措施:首先在使用深度分离卷积代替标准卷积,既减少模型参数又增加了卷积层的深度,从而提高了卷积操作的特征表达能力。其次缩小Energies数值的范围,将其与缩小因子α相乘可以将其范围缩小,这样可以避免在后面Softmax函数处理中出现数值过大的情况,提高模型稳定性和对于长序列的对齐能力。最后,使用两层BIGRU替换单层BILSTM,能更好提取上下文语义加强特征信息进而减少信息丢失。提高语音合成质量。实验结果表明,改进后的模型相比于原模型在长句语音合成时良好的对齐功能,有较好的鲁棒性且在提升原始模型的音色质量的前提下,训练损失降低了7%。Three improvement measures are proposed to address the issues of poor robustness in long sentence synthesis,poor alignment performance,and information loss in speech synthesis based on Local Sensitive Attention.Firstly,depth-separated convolution is used instead of standard convolution,which not only reduces model parameters but also increases The depth of the convolutional layer is increased,thereby improving the feature expression ability of the convolution operation.Secondly,reduce the range of the Energies value,and multiply it by the reduction factorαto reduce the range,which can avoid the occurrence of excessive values in the subsequent Softmax function processing,and improve the stability of the model and the alignment ability for long sequences.Finally,replacing the single-layer BILSTM with two layers of BICRU can better extract contextual semantic enhancement feature information and reduce information loss.Improve speech synthesis quality.The experimental results show that compared with the original model,the improved model has a good alignment function in long sentence speech synthesis,and has better robustness.On the premise of improving the timbre quality of the original model,the training loss is reduced by 7%.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249