检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:董安明 刘宗银 禹继国 韩玉冰 周酉 DONG Anming;LIU Zongyin;YU Jiguo;HAN Yubing;ZHOU You(Big Data Institute,Qilu University of Technology,Jinan Shandong 250353,China;School of Mathematics and Statistics,Qilu University of Technology,Jinan Shandong 250353,China;School of Computer Science and Technology,Qilu University of Technology,Jinan Shandong 250353,China;Shandong HiCon New Media Institute Company Limited,Jinan Shandong 250013,China)
机构地区:[1]齐鲁工业大学大数据研究院,济南250353 [2]齐鲁工业大学数学与统计学院,济南250353 [3]齐鲁工业大学计算机科学与技术学院,济南250353 [4]山东海看新媒体研究院有限公司,济南250013
出 处:《计算机应用》2022年第S01期54-58,共5页journal of Computer Applications
基 金:国家重点研发计划项目(2017YFB1400500);山东省重点研发计划项目(2019JZZY020124);山东省自然科学基金资助项目(ZR2017BF012);山东省高等学校青年创新团队发展计划(2019KJN010);齐鲁工业大学(山东省科学院)计算机科学与技术学科基础研究加强计划项目(2021JC02014);齐鲁工业大学(山东省科学院)计算机科学与技术学科人才培养提升计划项目(2021PY05001)。
摘 要:随着网络音乐产业的快速发展,构筑音乐自动检索和分类系统的需求日益增加。利用计算机对音乐流派进行正确标注是实现音乐类型精准分类和保障音乐推荐系统性能的重要前提。针对卷积运算不具备提取全局表征的能力,深度卷积神经网络对音乐流派数据的全局建模能力较弱的问题,提出了一种基于视觉变换(ViT)神经网络的音乐流派自动分类方法。该方法对待分类的音频进行预处理后,利用短时傅里叶变换(STFT)转化为尺寸统一的语谱图切片,实现音乐频域特征的转换。为了避免训练过拟合,通过增加白噪声对语谱图切片集进行数据增强。然后利用所生成的语谱切片集及其增强后的数据集对所构建的ViT神经网络进行训练,从而实现音乐流派风格的自动分类。仿真结果表明,所构建的ViT网络在音乐流派分类公共数据集GTZAN上的测试识别准确率达到91.01%,比基于AlexNet、AlexNet-enhanced和VGG16等传统卷积神经网络(CNN)的音乐流派分类方法提升了1.00~5.00个百分点。With the rapid development of the online music industry,the demand for building automatic music retrieval and classification systems is increasing.Correct annotation of music genres using computers is an important prerequisite to achieve accurate classification of music types and guarantee the performance of music recommendation systems.To address the problem that convolutional operations do not have the ability to extract global representations and deep convolutional neural networks are weak in global modeling of music genre data,an automatic music genre classification method based on Vision Transformer(ViT)neural network was proposed.After pre-processing the audio to be classified,a Short-Time Fourier Transform(STFT)was used to transform it into uniform-sized spectrogram slices to realize the conversion of music frequency domain features.In order to avoid training over-fitting,data enhancement was performed by adding white noise to the speech spectrum graph slice set.Then the generated spectrum slice set and its enhanced data set were used to train the constructed ViT neural network,so as to realize the automatic classification of music genre styles.Simulation results show that the test recognition accuracy of the constructed ViT network on the public GTZAN data set reaches 91.01%,which is 1.00-5.00 percentage points higher than those of traditional Convolutional Neural Network(CNN)based music genre classification methods such as AlexNet,AlexNet-enhanced and VGG16.
关 键 词:视觉变换网络 音乐流派 特征转换 语谱图 深度学习 数据增强
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术] J609.9[艺术—音乐]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117

