检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:许华杰[1,2,3,4] 张勃 Xu Huajie;Zhang Bo(College of Computer&Electronic Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications&Network Technology,Guangxi University,Nanning 530004,China;Key Laboratory of Parallel,Distributed&Intelligent Computing,Guangxi University,Nanning 530004,China;Guangxi Intelligent Digital Services Research Center of Engineering Technology,Guangxi University,Nanning 530004,China)
机构地区:[1]广西大学计算机与电子信息学院,南宁530004 [2]广西大学广西多媒体通信与网络技术重点实验室,南宁530004 [3]广西大学广西高校并行分布与智能计算重点实验室,南宁530004 [4]广西大学广西智能数字服务工程技术研究中心,南宁530004
出 处:《计算机应用研究》2023年第9期2770-2774,共5页Application Research of Computers
基 金:国家自然科学基金资助项目(71963001);广西壮族自治区科技计划资助项目(2017AB15008);崇左市科技计划资助项目(FB2018001)。
摘 要:音频数据规模不足是语音识别过程中的一个常见问题,通过较少的训练数据训练得到的语音识别模型效果难以得到保证。因此,提出一种基于生成对抗网络与特征融合的多尺度音频序列生成方法(multi-scale audio sequence GAN,MAS-GAN),包含多尺度音频序列生成器和真伪—类别判别器。生成器通过3个上采样子网络学习音频序列不同时域和频域的特征,再将不同尺度的特征融合成伪音频序列;判别器通过辅助分类器将生成的伪数据和真实数据区分开,同时指导生成器生成各类别的数据。实验表明,与目前主流的音频序列生成方法相比,所提方法的IS和FID分数分别提高了6.78%和3.75%,可以生成更高质量的音频序列;同时通过在SC09数据集上进行分类实验来评估生成音频序列的质量,所提方法的分类准确率比其他方法高2.3%。Insufficient audio data scale is a common problem in the speech recognition process,and it is difficult to guarantee the effect of the speech recognition model trained with less training data.Therefore,this paper proposed a multi-scale audio sequence generation method based on generative confrontation network and feature fusion(MAS-GAN),which consisted of a multi-scale audio sequence generator and a real/fake-category discriminator.The generator learnt the features of audio sequences in different time and frequency domains through three up-sampling sub-networks,and then fused the features of different scales into pseudo audio sequence.The discriminator distinguished the generated fake data from the real data though the auxiliary classifier,and guided the generator to generate data of various categories.Experiment shows that the IS and FID scores are increased by 6.78%and 3.75%respectively compared with the current mainstream audio sequence generation methods,the proposed method can generate higher quality audio sequences;at the same time,it evaluated the quality of the generated audio sequences by performing classification experiments on the SC09 dataset,the classification accuracy is about 2.3%higher than other methods.
关 键 词:音频序列生成 生成对抗网络 半监督学习 特征融合
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.104