检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈爱华[1] 张石清[1] Chen Aihua;Zhang Shiqing(School of Electronics and Information Engineering,Taizhou University,Taizhou 318000,China)
机构地区:[1]台州学院电子与信息工程学院,浙江台州318000
出 处:《台州学院学报》2021年第3期1-6,33,共7页Journal of Taizhou University
基 金:国家自然科学基金资助项目(61976149);浙江省自然科学基金(LZ20F020002)。
摘 要:针对当前单一的语音端点检测算法检测率低、鲁棒性较差的问题,提出一种基于深度神经网络和多特征融合的语音端点检测算法。该算法首先分别采用Gammatone滤波、Gabor滤波和LTSV滤波算法提取音频文件的耳蜗特征、短时特征以及长时变化特征;接着对三种特征做均一化处理,并在融合后将其作为深度神经网络的输入信号;然后采用事先训练好的深度神经网络模型,计算语音片段中属于语音/非语音的概率,进而确定语音片段的属性;最后采用中值滤波去掉误检测点,完成语音端点检测。为了验证算法的有效性,选取多个环境下采集的语音信号进行仿真实验。实验结果表明,该算法可以在噪声强度较高的环境下实现语音端点检测,并具有较好的准确性和鲁棒性。Aiming at the problems that single voice activity detection algorithm has low accuracy and poor robust‐ness,voice activity detection based on deep neural network and multi-feature fusion is proposed.Firstly,Gammatone filter,Gabor filter and LTSV filter are used to extract human auditory feature,short-term feature and long-term signal variability feature of audio file.Then,these three feature signals are homogenized and fused as the input signal of deep neural network.The trained deep neural network model calculates the probability that the speech segment belongs to speech/non speech,and then determines the attributes of the speech segment.Finally,the median filter is used to remove the false detection points and complete voice activity detection.In order to verify the effectiveness of proposed algorithm,this paper selects speech signal streams in multiple environments for simulation experiments.The results show that the proposed algorithm can achieve voice activity detection in the environment with high noise intensity,and has good accuracy and robustness.
关 键 词:深度神经网络 Gammatone滤波 GABOR滤波 LTSV滤波 语音端点检测
分 类 号:TN912.34[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3