检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄翔东 王俊芹 马金英 张烜溢 Huang Xiangdong;Wang Junqin;Ma Jinying;Zhang Xuanyi(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;School of Electronic Engineering,Tianjin University of Technology and Education,Tianjin 300222,China;Georgia Tech Shenzhen Institute(GTSI),Tianjin University,Shenzhen 518067,China)
机构地区:[1]天津大学电气自动化与信息工程学院,天津300072 [2]天津职业技术师范大学电子工程学院,天津300222 [3]天津大学佐治亚理工深圳学院,深圳518067
出 处:《天津大学学报(自然科学与工程技术版)》2023年第8期815-822,共8页Journal of Tianjin University:Science and Technology
基 金:青海省基础研究计划面上资助项目(2021-ZJ-910).
摘 要:为实现高质量、高效率、低成本的语音合成,设计开发了一种基于全相位滤波器组频带鉴别的生成对抗网络声码器APFB-GAN.该声码器以现有的HiFi-GAN为参考,在生成器中,削减了HiFi-GAN多感受野融合模块约60%的参数.在鉴别器中做了两点改进:一是将HiFi-GAN中多尺度鉴别器与多周期鉴别器替换为基于全相位滤波器组的鉴别器,克服了原有模型无法依据语音能量非均匀频带分布,灵活进行特征特征提取的缺点;二是提出基于频带加权的多窗长的短时傅里叶变换谱损失函数,配合鉴别器更好地稳定训练.实验结果表明:APFB-GAN声码器合成的语音质量可与HiFi-GAN相媲美,且其高频细节特征更为突出,模型参数只为HiFi-GAN的28.78%,在GPU上的合成速度是HiFi-GAN的2.4倍.To achieve high-quality,high-efficiency,and low-cost speech synthesis,a generative adversarial network(GAN)vocoder based on all-phase filter bank discrimination(APFB-GAN)is designed and developed herein.The vocoder uses an existing high fidelity generative adversarial network(HiFi-GAN)as a reference and cuts the parameters of the HiFi-GAN multi-receptive field fusion module by about 60%.Furthermore,two innovations are made in the discriminator.First,the multi-scale discriminator and multi-period discriminator in HiFi-GAN are replaced with a discriminator based on an all-phase filter bank,which essentially overcomes the shortcomings of the original model that cannot flexibly extract features based on the nonuniform band distribution of speech energy.Second,a short-time Fourier transform spectral loss function based on frequency band weighted multiwindow length is proposed,and the discriminator is used to increase the stability of training.Experimental results show that the speech quality synthesized by the APFB-GAN vocoder is comparable to that synthesized by HiFi-GAN,and its high-frequency detail characteristics are highly prominent.The parameters of the proposed model are only 28.78%compared to those of HiFi-GAN,and the synthesis speed on the GPU is 2.4 times that of HiFi-GAN.
分 类 号:TN912.33[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49