融合多尺度特征的短时音频场景识别方法被引量：1

Short-time acoustic scene recognition method using multi-scale feature fusion

作　　者：王猛张鹏远[1,2] WANG Meng;ZHANG Pengyuan(Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences, Beijing 100190;University of Chinese Academy of Sciences, Beijing 100049)

机构地区：[1]中国科学院声学研究所语言声学与内容理解重点实验室,北京100190 [2]中国科学院大学,北京100049

出　　处：《声学学报》2022年第6期717-726,共10页Acta Acustica

基　　金：国家自然科学基金项目(62071461)资助。

摘　　要：为解决短时音频场景识别任务中识别性能差的问题,提出一种融合多尺度特征的音频场景识别方法。首先将双声道音频中左右声道的和差作为输入,并使用长时帧长进行分帧处理,以保证提取出的帧级特征中包含足够多的音频信息。然后将特征逐帧输入到融合多尺度特征的一维卷积神经网络中,以充分利用网络中不同尺度的浅层、中层和深层嵌入特征。最后综合所有帧级软标签得到短时音频的场景分类结果。实验结果表明,该方法在国际声学场景和事件检测与分类挑战赛(DCASE) 2021短时音频场景数据集上的准确率为79.02%,实现了该数据集上目前为止的最优性能。For the problem of poor recognition performance in short-time acoustic scene recognition task,a method using multi-scale feature fusion is proposed.Firstly,this method takes the sum and difference of the stereo audio ’s left and right channels as input.And a long frame length is used for frame processing to ensure that the extracted framelevel features contain enough audio information.Then,the features are input frame by frame into a one-dimensional convolutional neural network which uses multi-scale feature fusion to make full use of the shallow,middle and deep embedding at different scales in the network.Finally,all the frame-level soft labels are integrated to obtain the scene label of the audio.Experimental results show that the accuracy of this method on the Detection and Classification of Acoustic Scenes and Events(DCASE) 2021 short-time audio scene dataset is 79.02%,which achieves state-of-the-art performance on this dataset so far.

关键词：卷积神经网络多尺度特征场景识别左右声道音频场景分类最优性能帧处理

分类号：TN912.34[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合多尺度特征的短时音频场景识别方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合多尺度特征的短时音频场景识别方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

融合多尺度特征的短时音频场景识别方法被引量：1