基于自注意力路由胶囊网络的多音事件检测  

Polyphonic Sound Event Detection Based on Self-Attention Routing Capsule Network

在线阅读下载全文

作  者:李海涛 杨树国[1] LI Haitao;YANG Shuguo(College of Mathematics and Physics,Qingdao University of Science and Technology,Qingdao 266061,China)

机构地区:[1]青岛科技大学数理学院,山东青岛266061

出  处:《青岛科技大学学报(自然科学版)》2022年第5期121-126,共6页Journal of Qingdao University of Science and Technology:Natural Science Edition

基  金:山东省自然科学基金项目(ZR2021QF040)。

摘  要:声音事件检测是目前计算机听觉领域中的重要问题,而多声音事件检测是其中一个极具挑战性的研究热点。基于最新提出的非迭代的自注意力路由方法和胶囊网络,本文提出了一种基于自注意力路由的多路径胶囊网络模型,将其用于多声音事件检测。由于自注意力路由方法是非迭代且高度并行的,大大加快了模型的训练速度;多路径基础胶囊层使用不同大小的非对称卷积核,不仅使模型能获得不同分辨率的信息,还能极大地保留时间信息,从而提高了模型的性能。本工作在2017年声音场景与事件检测分类挑战赛(Detection and Classification of Acoustic Scenes and Events,DCASE 2017)挑战任务4数据集上对所提出的模型和方法进行了对比实验及性能评估。其中,音频标注子任务的F分数达到了59.5%,音频事件检测的错误率降低到0.72,检测效果有较大的提升。结果表明:本方法具有事件检测准确率高、速度快、泛化能力强等优点。Sound event detection is currently an important issue in the field of computer hearing,and polyphonic sound event detection is one of the most challenging research hotspots.Based on the newly proposed non-iterative self-attention routing method and capsule network,this paper proposes a multi-path capsule network model based on self-attention routing,which is used for polyphonic event detection.Since the self-attention routing method is non-iterative and highly parallel,it greatly accelerates the training speed of the model;the multi-path primary capsule layer uses asymmetric convolution kernels of different sizes,which not only enables the model to obtain information of different resolutions,but also extremely retains time information,thereby improving the performance of the model.This paper conducts comparative experiments and performance evaluation of the proposed models and methods on the data set of DCASE 2017 Task 4.The F score of the audio tagging subtask is 59.5%,and the error rate of the sound event detection is reduced to 0.72,which is a big improvement.The results show that the method in this paper has the advantages of high sound event detection accuracy,fast speed and strong generalization ability.

关 键 词:多声音事件检测 胶囊网络 DCASE 2017挑战 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象