基于生成对抗网络的音频补偿方法被引量：3

Speech gap inpainting with generation adversarial network

作　　者：王杰[1,2] 观元升胡文林 WANG Jie;GUAN Yuansheng;HU Wenlin(School of Electronics and Communication Engineering,Guangzhou University,Guangzhou 510006,Guangdong,China;National Engineering Laboratory for Digital Construction and Evaluation of Urban Rail Transit,China Railway Design Corporation,Tianjin 300308,China)

机构地区：[1]广州大学电子与通信工程学院,广东广州510006 [2]中国铁路设计集团有限公司城市轨道交通数字化建设与测评技术国家工程实验室,天津300308

出　　处：《陕西师范大学学报（自然科学版）》2022年第6期39-48,共10页Journal of Shaanxi Normal University：Natural Science Edition

基　　金：城市轨道交通数字化建设与测评技术国家工程实验室开放课题(2021JZ02);国家自然科学基金(11974086);广州大学校内科研项目(YJ2021008);广州市科技计划项目(201904010468)。

摘　　要：为解决音频补偿存在可修复片段长度较短、修复对象局限于高重复性音频和采用语谱图所带来的逆变换失真等问题,提出了针对长语音补偿的新生成对抗网络。新网络模型以原始语音作为输入输出信号,解决传统基于语谱图方法的局限性。首先,采用前后文编解码器作为生成器,提高对信号时域空白间隙周围可用内容的利用率;其次,将语音特征提取模块加入鉴别器,通过学习前后文内容中音高、音素特征,有效提升训练效率和生成质量。结果表明:与现有多个算法进行对比,提出的生成对抗网络具有良好的语音补偿性能,可修复间隙长度达256 ms。进一步通过变速扩展音频长度,针对扩展语音新模型可稳定修复长达500 ms的语音间隙。In order to solve problems in audio inpainting, such as the short length of repairable segment, limited object to music audio with high repeatability, and inverse transformation distortion caused by using spectrogram, a new generation adversarial network for long speech inpainting is proposed. The new network takes the original speech signals as input and output, which solves the limitations of the model based on spectrogram. Firstly, it is proposed to use a context codec as a generator to improve the utilization rate of available content around the signal time-domain gap;secondly, a speech feature extraction module is added to the discriminator to effectively improve the training efficiency and generation quality by learning the pitch and phoneme features in the content before and after. Compared with several algorithms, the objective and subjective evaluation results show that our new generation adversarial network proposed in this paper has outstanding speech inpainting performance, and the generation gap length can reach 256 ms. Furthermore, the speech gap of up to 500 ms can be repaired stably for the new extended speech model by varying the audio length.

关键词：音频补偿生成对抗网络前后文编解码器语音特征提取

分类号：TB518[理学—物理]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于生成对抗网络的音频补偿方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于生成对抗网络的音频补偿方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于生成对抗网络的音频补偿方法被引量：3