基于多通道特征和混合注意力的环境声音分类被引量：2

Environmental Sound Classification Based on Multi-channel Features and Mixed Attention

作　　者：周帅李理[1,2] 彭章君黄鹏程 ZHOU Shuai;LI Li;PENG Zhang-jun;HUANG Peng-cheng(School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621000,China;Sichuan Autonomous Controllable Artificial Intelligence Engineering Technology Center,Mianyang 621000,China)

机构地区：[1]西南科技大学计算机科学与技术学院,四川绵阳621000 [2]四川省自主可控人工智能工程技术中心,四川绵阳621000

出　　处：《计算机技术与发展》2023年第8期43-50,共8页Computer Technology and Development

基　　金：国家自然科学基金(U21A20157);国家重点研发计划(2019YFB1310501)。

摘　　要：环境声音分类(ESC)已成为非常重要的研究方向,但由于环境声音种类繁多,无法进行统一表征,加之易受噪声的干扰,使得ESC任务变得复杂。为了提高ESC任务的识别精度,提出了基于多通道特征和混合注意力模型的分类方法。首先,将ESC信号进行时频转换并使用多种滤波器提取频谱特征,将其重构为三通道特征图。多通道特征可以利用特征之间的互补性,弥补单一特征信息表征不足的缺点;其次,引入了一种由通道和时频注意力模块组成的混合分类模型,通道注意力模块计算特征图并对不同通道分配权重,含有更多有效信息且对该类声音分辨较好的通道特征则会被分配更多的权重,时频注意力模块会重点关注时域和频域中更有效的信息。该方法可较好地抑制背景噪声,消除冗余,提高收敛速度和分类精度。对比实验表明,在ESC-10,ESC-50数据集上的识别精度分别达到了96.25%和89.56%,在Urbansound8k的数据集上达到98.40%。Environmental sound classification(ESC)has become a very important research direction.However,the task of ESC becomes complicated due to the variety of environmental sounds,which cannot be characterized uniformly,and the susceptibility to noise.In order to improve the recognition accuracy of ESC task,a classification method based on multi-channel feature and mixed attention model is proposed.Firstly,the ESC signal is converted into time-frequency,and the spectral features are extracted by a variety of filters,which are reconstructed into a three-channel feature map.Multi-channel features can make use of the complementarity between features to make up for the lack of single feature information representation.Secondly,a hybrid classification model consisting of channels and time-frequency attention modules is introduced.The channel attention module calculates the feature map and assigns weights to different channels.The channel features with more valid information and better resolution for this type of sound will be assigned more weights.The time-frequency attention module will focus on more valid information in the time domain and frequency domain.The proposed method can suppress the background noise,eliminate the redundancy,and improve the convergence speed and classification accuracy.The comparison experiment shows that the recognition accuracy reaches 96.25%and 89.56%on ESC-10 and ESC-50 datasets respectively,and 98.40%on Urbansound8k datasets.

关键词：环境声音分类多通道特征通道注意力时频注意力混合注意力模型深度模型

分类号：TP391.42[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多通道特征和混合注意力的环境声音分类被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多通道特征和混合注意力的环境声音分类 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多通道特征和混合注意力的环境声音分类被引量：2