检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曾春艳[1] 马超峰 王志锋[2] 孔祥斌 ZENG Chunyan;MA Chaofeng;WANG Zhifeng;KONG Xiangbin(Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System,Hubei University of Technology,Wuhan 430068,China;Department of Digital Media Technology,Central China Normal University,Wuhan 430079,China;School of Mechanical Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074,China)
机构地区:[1]湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室,湖北武汉430068 [2]华中师范大学数字媒体技术系,湖北武汉430079 [3]华中科技大学机械科学与工程学院,湖北武汉430074
出 处:《华中科技大学学报(自然科学版)》2020年第6期39-44,共6页Journal of Huazhong University of Science and Technology(Natural Science Edition)
基 金:国家自然科学基金资助项目(61901165,61501199);湖北省教育厅科学技术研究项目(Q20191406);湖北省自然科学基金资助项目(2017CFB683);湖北省高等学校优秀中青年科技创新团队资助项目(T201805)。
摘 要:为了提升说话人识别技术在复杂噪声环境下的识别性能,提出了一种基于高斯均值矩阵和卷积神经网络的鲁棒性说话人识别方法,应用于纯净语音训练出的模型上测试含噪语音的场景.其中高斯均值矩阵是采用最大后验概率(MAP)对传统的梅尔频率倒谱系数(MFCC)特征进行自适应操作得到的,这一操作增加了帧与帧之间的关联性,使特征携带更丰富的说话人身份信息.同时采用卷积神经网络进一步对帧层面的信息进行对准,并从数据中学习到更有利于说话人识别的特征表示,从而提升说话人识别的鲁棒性.实验结果表明在Libri语音数据集上,所提出方法的鲁棒性优于GMM-UBM和GSV-SVM算法.In order to effectively improve the performance of speaker recognition technology in the noisy environment,a robust algorithm based on Gaussian mean matrix and convolutional neural networks was proposed,especially where the speaker recognition system was trained from noise free speech and tested on noise speech.Gaussian mean matrix was obtained by performing an adaptive operation on the Mel frequency cepstral coefficients(MFCCs)using the maximum a posterior(MAP).This operation increased the correlation between frames and made the feature carry more speaker information.Convolutional neural networks was used to further align the content of the frame level,and learned the feature representations that were more conducive to speaker recognition from the data.Experimental results show that the proposed method is more robust than the GMM-UBM and GSV-SVM in Libri speech corpus.
关 键 词:说话人识别 鲁棒性 卷积神经网络 高斯均值矩阵 最大后验概率
分 类 号:TN912.34[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.230.65