基于生成对抗网络与特征融合的多尺度音频序列生成方法  

Multi-scale audio sequence generation method based on generative adversarial networks and feature fusion

在线阅读下载全文

作  者:许华杰[1,2,3,4] 张勃 Xu Huajie;Zhang Bo(College of Computer&Electronic Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications&Network Technology,Guangxi University,Nanning 530004,China;Key Laboratory of Parallel,Distributed&Intelligent Computing,Guangxi University,Nanning 530004,China;Guangxi Intelligent Digital Services Research Center of Engineering Technology,Guangxi University,Nanning 530004,China)

机构地区:[1]广西大学计算机与电子信息学院,南宁530004 [2]广西大学广西多媒体通信与网络技术重点实验室,南宁530004 [3]广西大学广西高校并行分布与智能计算重点实验室,南宁530004 [4]广西大学广西智能数字服务工程技术研究中心,南宁530004

出  处:《计算机应用研究》2023年第9期2770-2774,共5页Application Research of Computers

基  金:国家自然科学基金资助项目(71963001);广西壮族自治区科技计划资助项目(2017AB15008);崇左市科技计划资助项目(FB2018001)。

摘  要:音频数据规模不足是语音识别过程中的一个常见问题,通过较少的训练数据训练得到的语音识别模型效果难以得到保证。因此,提出一种基于生成对抗网络与特征融合的多尺度音频序列生成方法(multi-scale audio sequence GAN,MAS-GAN),包含多尺度音频序列生成器和真伪—类别判别器。生成器通过3个上采样子网络学习音频序列不同时域和频域的特征,再将不同尺度的特征融合成伪音频序列;判别器通过辅助分类器将生成的伪数据和真实数据区分开,同时指导生成器生成各类别的数据。实验表明,与目前主流的音频序列生成方法相比,所提方法的IS和FID分数分别提高了6.78%和3.75%,可以生成更高质量的音频序列;同时通过在SC09数据集上进行分类实验来评估生成音频序列的质量,所提方法的分类准确率比其他方法高2.3%。Insufficient audio data scale is a common problem in the speech recognition process,and it is difficult to guarantee the effect of the speech recognition model trained with less training data.Therefore,this paper proposed a multi-scale audio sequence generation method based on generative confrontation network and feature fusion(MAS-GAN),which consisted of a multi-scale audio sequence generator and a real/fake-category discriminator.The generator learnt the features of audio sequences in different time and frequency domains through three up-sampling sub-networks,and then fused the features of different scales into pseudo audio sequence.The discriminator distinguished the generated fake data from the real data though the auxiliary classifier,and guided the generator to generate data of various categories.Experiment shows that the IS and FID scores are increased by 6.78%and 3.75%respectively compared with the current mainstream audio sequence generation methods,the proposed method can generate higher quality audio sequences;at the same time,it evaluated the quality of the generated audio sequences by performing classification experiments on the SC09 dataset,the classification accuracy is about 2.3%higher than other methods.

关 键 词:音频序列生成 生成对抗网络 半监督学习 特征融合 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象