无参注意力结合自监督改善音频分类方法

An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision

作　　者：公绪超李宗民[1] Gong Xuchao;Li Zongmin(School of Computer Science and Technology,China University of Petroleum(East China),Qingdao 266580;Information Technology Service Center of Sinopec Shengli Petroleum Administration Co.,Ltd,Dongying 257000)

机构地区：[1]中国石油大学(华东)计算机科学与技术学院,青岛266580 [2]中国石化集团胜利石油管理局有限公司信息化技术服务中心,东营257000

出　　处：《计算机辅助设计与图形学学报》2023年第3期434-440,共7页Journal of Computer-Aided Design & Computer Graphics

基　　金：国家重点研发计划(2019YF0301800);国家自然科学基金(61379106).

摘　　要：基于transformer端到端音频分类方法在许多场景下证明可以达到优于二维卷积的效果.针对目前常用的transformer音频分类方法只关注不同时序间的特征重要性,而对同时序间的特征重要程度刻画程度不足的问题,提出一种无参注意力结合自监督特征构建的方法改善音频分类效果.通过在同时序特征中构造无参多局部极值注意力机制,拟合特征多局部极值分布,刻画同时序间的特征重要性;通过对输入的音频频谱图在时域和频域上随机掩码,加入自监督信息,有效地学习音频频谱细节及分类信息.采用audio set数据集,esc50数据集以及Speech Command数据集进行对比实验,实验结果表明,该算法比基准方法在识别准确率指标上提升了0.46%~1.20%.The end-to-end audio classification method based on transformer is proved to be better than two-dimensional convolution in multiple scenes.In view of the current popular audio recognition method based on serialization learning transformer,which focuses on the importance of current features in time sequence,and the lack of feature description of simultaneous sequence,a method of parameter-free attention combined with self supervised feature construction is proposed to further improve audio classification.In this method,the parameter-free attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution.At the same time,in the process of model learning,the input spectrum is randomly masked in time domain and frequency domain,and self-supervision information is added to effectively learn the audio spectrum details and classification information.The experimental results using audio set,esc50 and Speech Command show that the accuracy of algorithm in this paper improves by 0.46%~1.20%,compared with the current state of the art method.

关键词：TRANSFORMER 注意力机制自监督音频分类

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

无参注意力结合自监督改善音频分类方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

无参注意力结合自监督改善音频分类方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索