基于全相位滤波器组频带鉴别的生成对抗网络声码器设计  被引量:1

Design of Generative Adversarial Network Vocoder Based on All-Phase Filter Bank Discrimination

在线阅读下载全文

作  者:黄翔东 王俊芹 马金英 张烜溢 Huang Xiangdong;Wang Junqin;Ma Jinying;Zhang Xuanyi(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;School of Electronic Engineering,Tianjin University of Technology and Education,Tianjin 300222,China;Georgia Tech Shenzhen Institute(GTSI),Tianjin University,Shenzhen 518067,China)

机构地区:[1]天津大学电气自动化与信息工程学院,天津300072 [2]天津职业技术师范大学电子工程学院,天津300222 [3]天津大学佐治亚理工深圳学院,深圳518067

出  处:《天津大学学报(自然科学与工程技术版)》2023年第8期815-822,共8页Journal of Tianjin University:Science and Technology

基  金:青海省基础研究计划面上资助项目(2021-ZJ-910).

摘  要:为实现高质量、高效率、低成本的语音合成,设计开发了一种基于全相位滤波器组频带鉴别的生成对抗网络声码器APFB-GAN.该声码器以现有的HiFi-GAN为参考,在生成器中,削减了HiFi-GAN多感受野融合模块约60%的参数.在鉴别器中做了两点改进:一是将HiFi-GAN中多尺度鉴别器与多周期鉴别器替换为基于全相位滤波器组的鉴别器,克服了原有模型无法依据语音能量非均匀频带分布,灵活进行特征特征提取的缺点;二是提出基于频带加权的多窗长的短时傅里叶变换谱损失函数,配合鉴别器更好地稳定训练.实验结果表明:APFB-GAN声码器合成的语音质量可与HiFi-GAN相媲美,且其高频细节特征更为突出,模型参数只为HiFi-GAN的28.78%,在GPU上的合成速度是HiFi-GAN的2.4倍.To achieve high-quality,high-efficiency,and low-cost speech synthesis,a generative adversarial network(GAN)vocoder based on all-phase filter bank discrimination(APFB-GAN)is designed and developed herein.The vocoder uses an existing high fidelity generative adversarial network(HiFi-GAN)as a reference and cuts the parameters of the HiFi-GAN multi-receptive field fusion module by about 60%.Furthermore,two innovations are made in the discriminator.First,the multi-scale discriminator and multi-period discriminator in HiFi-GAN are replaced with a discriminator based on an all-phase filter bank,which essentially overcomes the shortcomings of the original model that cannot flexibly extract features based on the nonuniform band distribution of speech energy.Second,a short-time Fourier transform spectral loss function based on frequency band weighted multiwindow length is proposed,and the discriminator is used to increase the stability of training.Experimental results show that the speech quality synthesized by the APFB-GAN vocoder is comparable to that synthesized by HiFi-GAN,and its high-frequency detail characteristics are highly prominent.The parameters of the proposed model are only 28.78%compared to those of HiFi-GAN,and the synthesis speed on the GPU is 2.4 times that of HiFi-GAN.

关 键 词:语音合成 声码器 生成对抗网络 全相位滤波器组 

分 类 号:TN912.33[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象