检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨贵安 邵玉斌[1] 龙华[1] 杜庆治[1] YANG Guian;SHAO Yubin;LONG Hua;DU Qingzhi(Kunming University of Science and Technology,Kunming Yunnan 650500,China)
机构地区:[1]昆明理工大学,云南昆明650500
出 处:《通信技术》2021年第2期317-322,共6页Communications Technology
摘 要:为解决单一语音、音乐音频及其两者的混合音频进行语音/音乐分类时分类结果不准确的问题,提出一种基于音频分割的音频分类算法。利用能熵比特征进行音频分割,分割出的音乐段较为准确,而利用幅度均方根特征进行音频分割,分割出的语音段较为准确,避免了对语音段的过度分割。将两种分割方法分割所得音频段的起点和终点升序排列并两两组合形成新的音频段作为音频分割结果,音频分割结果中的每一个音频段即一种类型的音频。对音频分割结果中的每一个音频段提取幅度的峰态系数和平均基频两个特征,并利用高斯混合模型作为后端分类器进行分类。最后为了消除过分割现象,将同类型的相邻音频段合并便得到最终分类结果。实验结果表明,所提出的算法对混合音频具有很高的分割准确率,达到98.24%,对单一音频和混合音频仅提取二维特征便得到较高的分类准确率,分别达到98%和98.61%,与同类算法相比较分类准确率平均提高3.80%。To solve the problem of inaccurate classification results for single speech,music audio and the mixed audio of the two,an audio classification algorithm based on audio segmentation is proposed.Using the energy-entropy ratio feature for audio segmentation,the segmented music segment is more accurate,while using the amplitude root-mean-square feature for audio segmentation;the segmented speech segment is more accurate,avoiding excessive segmentation of the speech segments.The starting and ending points of the audio segments obtained by the two segmentation methods are arranged in ascending order and combined in pairs to form a new audio segment as the audio segmentation result.Each audio segment in the audio segmentation result is a type of audio.Two features involving the peak-state coefficient of amplitude and the average fundamental frequency are extracted for each audio segment in the audio segmentation result,and the Gaussian mixture model is used as the back-end classifier for classification.Finally,in order to eliminate the phenomenon of over-segmentation,the final classification result is obtained by combining adjacent segments of the same type.The experimental results show that the proposed algorithm has a very high segmentation accuracy of 98.24%for mixed audio,and a relatively high classification accuracy of 98%and 98.61%respectively for single audio and mixed audio with only 2-dimensional features extracted,which is an average improvement of 3.80%as compared with similar algorithms.
关 键 词:音频分类 音频特征 音频分割 幅度的峰态系数 平均基频
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.141.47.84