检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈树丽 张学帅 张鹏远[1,2] 刘建 CHEN Shuli;ZHANG Xueshuai;ZHANG Pengyuan;LIU Jian(Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
机构地区:[1]中国科学院声学研究所语言声学与内容理解重点实验室,北京100190 [2]中国科学院大学,北京100049
出 处:《声学学报》2022年第4期531-540,共10页Acta Acustica
摘 要:为解决背景音及噪音等条件下音频检索识别率低的问题,提出静音掩蔽和频域分段的音频指纹检索算法。首先采用端点检测技术进行语音预处理,将有效语音帧重新组合并利用相邻子带能量差对其提取指纹特征,可有效解决静音帧指纹特征不鲁棒的问题。然后在检索匹配时根据不同音频信号在频域范围内的分布特点,对音频指纹在不同频率区间进行分段和加权,以更精确地计算模板和待检音频之间的相似度。实验表明,与Philips基线算法相比,所提算法在检索速度上提升了一倍,在受背景音等干扰的数据集上,平均准确率与召回率分别绝对提升17.94%和4.66%;与最新Philips算法相比,平均准确率与召回率分别绝对提升13.68%和2.45%。The recognition rate of the audio retrieval algorithm is often significantly reduced under unclean interference conditions such as background music and noise,an audio fingerprint retrieval algorithm based on the mute masking and frequency segmentation is proposed to mitigate this problem.Firstly,the voice activity detection technology is used to remove the non-valid speech frames,the valid speech frames are then recombined and extracted features according to the difference of the adjacent sub-band energy,which can effectively solve the problem that silence frame fingerprint characteristics are not robust.During the search matching stage,the non-uniform frequency segmentation and weighted method computed by the distribution characteristics of different audio signals is applied on the audio fingerprint features.These transformed features are more discriminative between the template audio and the test audio.Experiments show that compared with the classic Philips baseline algorithm,the proposed algorithm doubles the retrieval speed.At the meantime,it yields a large definite improvement over Philips by 17.94%on mean average precision and 4.66%on recall rate respectively for the data set disturbed by background sounds.Compared with the latest Philips algorithm,the average accuracy rate and recall rate are definitely increased by 13.68%and 2.45%respectively.
关 键 词:检索速度 检索算法 指纹特征 音频指纹 平均准确率 相似度 音频信号 音频检索
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147