基于深层说话人矢量的说话人检索  

Speaker retrieval based on deep speaker vector

在线阅读下载全文

作  者:李威[1] 杨继臣[1] 贺前华[1] 李艳雄[1] 

机构地区:[1]华南理工大学电子与信息学院,广东广州510640

出  处:《华中科技大学学报(自然科学版)》2015年第7期62-65,共4页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(61301300);中国博士后科学基金资助项目(2013M531850);中央高校基本科研业务费资助项目(2013ZM0097)

摘  要:为了解决浅层特征不能有效刻画说话人特征,导致说话人检索率不高的问题,提出了一种基于深层说话人矢量的说话人检索方法.使用受限波尔兹曼机逐层构建一个多层的深层特征提取器用以提取说话人深层特征.为说话人构建基于深层特征的深层说话人矢量.通过计算要检索的说话人的深层说话人矢量和检索库中的说话人深层特征之间的最小距离,对目标说话人进行检索.实验结果表明:在深层特征下,使用深层说话人矢量可以检索到绝大部分的目标说话人;随着深度层数的增加,检索率先增后减,检索率最高对应的深度层数是7;随着深度层数的增加,检索时间非线性增加.In order to solve the problem that shallow feature can not depict speakers effectively,spearker reterieval rate is low,a method of speaker retrieval was proposed based on deep speaker vectors.Firstly,a multi layers deep feature extractor was constructed by using restriced boltzmann machines(RBM)training layer by layer to extract speaker deep feature.Secondly,deep speaker vectors were built.Lastly,object speaker was retrieved by calculating the minimal distance between deep speaker vectors of retrieval speaker and deep feature of speakers in retrieval library.Experimental results demonstrate that under deep feature,most of speakers can be retrieved using deep speaker vectors.Retrieval rate of the first and second layer are lower than mel-frequency cepstral coefficial(MFCC)and the third layer is the same as MFCC.Retrieval rate increases firstly and decreases later with the increasing of the depth of layers,and the highest retrieval rate corresponding to depth layers is 7.Retrieval time increases non-linearly with deep layer increasing.

关 键 词:深层特征 深层说话人矢量 最小距离 说话人检索 检索率 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象