检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:费鸿博 吴伟官 李平 曹毅 FEI Hongbo;WU Weiguan;LI Ping;CAO Yi(School of Mechanical Engineering,Jiangnan University,Wuxi 214122,Jiangsu,China;Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology(Jiangnan University),Wuxi 214122,Jiangsu,China)
机构地区:[1]江南大学机械工程学院,江苏无锡214122 [2]江苏省食品先进制造装备技术重点实验室(江南大学),江苏无锡214122
出 处:《哈尔滨工业大学学报》2022年第5期124-130,123,共8页Journal of Harbin Institute of Technology
基 金:高等学校学科创新引智计划(B18027);江苏省“六大人才高峰”计划(ZBZZ-012);江苏省优秀科技创新团队基金(2019SK07)。
摘 要:针对现有频谱分离方法进行声学场景分类研究时其分类准确率不高的问题,提出了一种基于梅尔频谱分离和长距离自校正卷积神经网络(long-distance self-calibration convolutional neural network,LSCNet)的声学场景分类方法。首先,介绍了频谱的谐波打击源分离原理,提出了一种梅尔频谱分离算法,将梅尔频谱分离出谐波分量、打击源分量和残差分量;然后,结合自校正神经网络和残差增强机制,提出了一种长距离自校正卷积神经网络;该模型采用频域自校正算法以及长距离增强机制来保留特征图原始信息,通过残差增强机制和通道注意力增强机制加强了深层特征与浅层特征间的关联度,且结合多尺度特征融合模块,以进一步提取模型训练中输出层的有效信息,从而提高模型的分类准确率;最后,基于Urbansound8K和ESC-50数据集开展了声学场景分类实验。实验结果表明:梅尔频谱的残差分量能够针对性地减少背景噪音的影响,从而具有更好的分类性能,且LSCNet实现了对特征图中频域信息的关注,其最佳分类准确率分别达到90.1%和88%,验证了该方法的有效性。When the existing spectrogram separation methods are used for acoustic scene classification research,the classification accuracy of these methods is not high.To solve the problem,an acoustic scene classification method based on Mel-spectrogram separation and long-distance self-calibration convolutional neural network(LSCNet)was proposed.Firstly,the working principles of spectrogram harmonic/percussive-source separation were presented.A Mel-spectrogram separation algorithm was proposed,which can separate the Mel-spectrogram into harmonic components,percussive source components,and residual components.Then,LSCNet was designed combining self-calibration convolutional network and residual enhancement mechanism.The model adopts frequency domain self-correction algorithm and long-distance enhancement mechanism to retain the original information of the feature map,strengthens the correlation between deep and shallow features through residual enhancement mechanism and channel attention enhancement mechanism,and combines multi-scale feature fusion module to further extract the effective information of the output layer in model training.Finally,acoustic scene classification experiments were conducted on Urbansound8K and ESC-50 datasets.Experimental results show that the Mel-spectrogram residual components(MSRC)could specifically reduce the influence of background noise,thereby indicating a better classification performance.The LSCNet could realize the attention to the frequency domain information in the feature map,and its best classification accuracy reached 90.1%and 88%respectively,which verified the effectiveness of the proposed method.
关 键 词:声学场景分类 梅尔频谱分离算法 长距离自校正卷积神经网络 频域自校正算法 多尺度特征融合
分 类 号:TP391.42[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.20.233.31