检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东华大学信息科学与技术学院,上海
出 处:《计算机科学与应用》2023年第5期995-1005,共11页Computer Science and Application
摘 要:为了提升用于音频场景识别的低复杂度神经网络的特征提取能力和性能,本文研究了以卷积神经网络(CNN)为主要方法的音频场景分类方法,在传统CNN结构上加入并改进了单独的注意力映射层,改进并对比了两种可用于轻量化卷积网络的注意力机制,在部分卷积层采用深度可分离卷积降低整体网络的参数量。使用较低成本的分组条状卷积替换原始卷积,采用了时频分离方法对整体卷积进行了设计,最终提出了SFAC (Sequence Frequency Attention CNN)网络模型。在语音场景多分类数据集(TAU Urban Acoustic Scenes、UrbanSound8K)上对比了SFAC和多个基于VGG结构的基线卷积网络模型,结果表明,本文提出的神经网络在保持较低的复杂度的前提下,对比基线模型能获得更高的准确度。In order to improve the feature extraction ability and performance of low complexity neural net-works for audio scene recognition, this paper investigates the audio scene recognition method with Convolutional Neural Network (CNN) as the main method, adds and improves a separate attention mapping layer on the traditional CNN structure, improves and compares two attention mechanisms that can be used for lightweight convolutional networks, and uses deep separable convolution in some convolutional layers to reduce the number of parameters of the overall network. The original convolution is replaced by a low-cost grouping strip convolution, and the time-frequency separation method is used to design the overall convolution. Finally, the SFAC (Sequence Frequency Attention CNN) network model is proposed. The SFAC and multiple baseline convolutional network models based on VGG structure are compared on the speech scene multi-classification datasets (TAU Urban Acoustic Scenes, UrbanSound8K). The results show that the neural network proposed in this paper can obtain higher accuracy than the baseline model while maintaining lower complexity.
关 键 词:音频场景识别 卷积神经网络 注意力卷积 异形卷积 通道注意力
分 类 号:TN9[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30