检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王杰[1,2] 观元升 胡文林 WANG Jie;GUAN Yuansheng;HU Wenlin(School of Electronics and Communication Engineering,Guangzhou University,Guangzhou 510006,Guangdong,China;National Engineering Laboratory for Digital Construction and Evaluation of Urban Rail Transit,China Railway Design Corporation,Tianjin 300308,China)
机构地区:[1]广州大学电子与通信工程学院,广东广州510006 [2]中国铁路设计集团有限公司城市轨道交通数字化建设与测评技术国家工程实验室,天津300308
出 处:《陕西师范大学学报(自然科学版)》2022年第6期39-48,共10页Journal of Shaanxi Normal University:Natural Science Edition
基 金:城市轨道交通数字化建设与测评技术国家工程实验室开放课题(2021JZ02);国家自然科学基金(11974086);广州大学校内科研项目(YJ2021008);广州市科技计划项目(201904010468)。
摘 要:为解决音频补偿存在可修复片段长度较短、修复对象局限于高重复性音频和采用语谱图所带来的逆变换失真等问题,提出了针对长语音补偿的新生成对抗网络。新网络模型以原始语音作为输入输出信号,解决传统基于语谱图方法的局限性。首先,采用前后文编解码器作为生成器,提高对信号时域空白间隙周围可用内容的利用率;其次,将语音特征提取模块加入鉴别器,通过学习前后文内容中音高、音素特征,有效提升训练效率和生成质量。结果表明:与现有多个算法进行对比,提出的生成对抗网络具有良好的语音补偿性能,可修复间隙长度达256 ms。进一步通过变速扩展音频长度,针对扩展语音新模型可稳定修复长达500 ms的语音间隙。In order to solve problems in audio inpainting, such as the short length of repairable segment, limited object to music audio with high repeatability, and inverse transformation distortion caused by using spectrogram, a new generation adversarial network for long speech inpainting is proposed. The new network takes the original speech signals as input and output, which solves the limitations of the model based on spectrogram. Firstly, it is proposed to use a context codec as a generator to improve the utilization rate of available content around the signal time-domain gap;secondly, a speech feature extraction module is added to the discriminator to effectively improve the training efficiency and generation quality by learning the pitch and phoneme features in the content before and after. Compared with several algorithms, the objective and subjective evaluation results show that our new generation adversarial network proposed in this paper has outstanding speech inpainting performance, and the generation gap length can reach 256 ms. Furthermore, the speech gap of up to 500 ms can be repaired stably for the new extended speech model by varying the audio length.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147