检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王玉伟 陈爱斌[1] 周国雄[1] 张志强[2] WANG Yuwei;CHEN Aibin;ZHOU Guoxiong;ZHANG Zhiqiang(Institute of Applied Artificial Intelligence,Central South University of Forestry and Technology,Changsha 410004,China;Wildlife Conservation and Utilization Laboratory,Central South University of Forestry and Technology,Changsha 410004,China)
机构地区:[1]中南林业科技大学人工智能应用研究所,长沙410004 [2]中南林业科技大学野生动植物保护与利用实验室,长沙410004
出 处:《哈尔滨理工大学学报》2024年第6期61-73,共13页Journal of Harbin University of Science and Technology
基 金:国家自然科学基金(62276276);中南林业科技大学研究生科技创新基金(cx202202083).
摘 要:针对自然环境中采集的鸟鸣声数据存在的长度不对齐性、冗余性、噪声、类内差异大的问题,提出一种由基于多级注意力的两阶段哈希算法和由融合对比损失轻量级分类器构成的自动化鸟鸣声识别模型。哈希算法一阶段解决冗余性和噪声的问题,首先将对数梅尔频谱图分块并计算每个片段之间的自注意力,提取计算得到的多级自注意力权重矩阵,然后利用自定义抑噪系数加权后的权值矩阵裁切输入中的冗余和噪声片段;哈希算法二阶段解决输入维度不对齐的问题,利用多级注意力构建的关联式权重矩阵筛选输入片段,实现维度归一化。针对类内差异大的问题,提出一种融合对比损失的综合损失函数,从而提升模型泛化性特征提取能力。实验结果表明:本文所提模型在自建的14种鸟类鸣叫声数据集上取得了92.49%的最佳性能,在公共数据集BirdsData、BIRDS上识别准确率分别为94.38%和97.74%,均超过现有方法。Aiming at the problems of length misalignment,redundancy,noise and large intra-class differences in birdsong data collected in the natural environment,an automatic birdsong recognition model composed of a two-stage hash algorithm based on multi-level attention and a lightweight classifier based on fusion contrastive loss is proposed.The first stage of the hash algorithm solves the problem of redundancy and noise by firstly dividing the logarithmic Mel spectrogram and calculating the self-attention between each fragment,extracting the calculated multi-level self-attention weight matrix,and then using the weight matrix weighted by the custom noise suppression coefficient to trim the redundancy and noise fragments in the input.The second stage of the hash algorithm solves the problem of misalignment of input dimensions,specifically by using a correlation weight matrix constructed by multi-level attention to screen input fragment to achieve dimension normalization.Aiming at the problem of large intra-class differences,a comprehensive loss function of fusion contrastive loss is proposed,which improve the ability to extract generalized features.The proposed model achieves the best performance of 92.49%on the self-built dataset of 14 kinds of bird songs,and the recognition accuracy of 94.38%and 97.74%on the public datasets BirdsData and BIRDS,respectively,surpassing the existing methods.
关 键 词:鸟鸣声识别 多级注意力 哈希压缩 对比损失 自建数据集
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.225.254.235