基于轻量化卷积神经网络的音频场景分类研究  

Research on Audio Scene Recognition Based on Lightweight Convolutional Neural Network

在线阅读下载全文

作  者:毛柯翔 谢颖华[1] 

机构地区:[1]东华大学信息科学与技术学院,上海

出  处:《计算机科学与应用》2023年第5期995-1005,共11页Computer Science and Application

摘  要:为了提升用于音频场景识别的低复杂度神经网络的特征提取能力和性能,本文研究了以卷积神经网络(CNN)为主要方法的音频场景分类方法,在传统CNN结构上加入并改进了单独的注意力映射层,改进并对比了两种可用于轻量化卷积网络的注意力机制,在部分卷积层采用深度可分离卷积降低整体网络的参数量。使用较低成本的分组条状卷积替换原始卷积,采用了时频分离方法对整体卷积进行了设计,最终提出了SFAC (Sequence Frequency Attention CNN)网络模型。在语音场景多分类数据集(TAU Urban Acoustic Scenes、UrbanSound8K)上对比了SFAC和多个基于VGG结构的基线卷积网络模型,结果表明,本文提出的神经网络在保持较低的复杂度的前提下,对比基线模型能获得更高的准确度。In order to improve the feature extraction ability and performance of low complexity neural net-works for audio scene recognition, this paper investigates the audio scene recognition method with Convolutional Neural Network (CNN) as the main method, adds and improves a separate attention mapping layer on the traditional CNN structure, improves and compares two attention mechanisms that can be used for lightweight convolutional networks, and uses deep separable convolution in some convolutional layers to reduce the number of parameters of the overall network. The original convolution is replaced by a low-cost grouping strip convolution, and the time-frequency separation method is used to design the overall convolution. Finally, the SFAC (Sequence Frequency Attention CNN) network model is proposed. The SFAC and multiple baseline convolutional network models based on VGG structure are compared on the speech scene multi-classification datasets (TAU Urban Acoustic Scenes, UrbanSound8K). The results show that the neural network proposed in this paper can obtain higher accuracy than the baseline model while maintaining lower complexity.

关 键 词:音频场景识别 卷积神经网络 注意力卷积 异形卷积 通道注意力 

分 类 号:TN9[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象