基于分层增长语音活动检测的鲁棒性说话人识别  

Robust speaker recognition based on level-building voice activity detection

在线阅读下载全文

作  者:解焱陆[1] 张劲松[1] 刘明辉[2] 黄中伟[2] 

机构地区:[1]北京语言大学信息科学学院,北京100083 [2]深圳大学语音实验室,深圳518060

出  处:《深圳大学学报(理工版)》2012年第4期328-334,共7页Journal of Shenzhen University(Science and Engineering)

基  金:国家自然科学基金项目(61005020);中央高校基本科研业务费专项资金资助项目(10JBT01)~~

摘  要:基于欧洲电信标准化协会颁布的分布式语音识别和前端标准(ETSI-DSR-AFE).针对分布式说话人识别噪声鲁棒性较差的问题,提出一种新的前端处理方法.该方法以似然距离为测度,对语音进行无监督聚类,为减少计算量,采用分层增长(level-building)方法进行逐层分割,从而准确找出语音和静音的边界点.实验结果表明,用该方法改进ETSI-DSR-AFE标准后,信噪比在大于0 dB时,说话人辨认系统识别率相对改进了18.9%,相对原有的Mel频率倒谱系数(Mel-frequenly Ceptral coefficients,MFCC)系统识别率改进了60.7%.A level-building and two-stage Wiener filter methodology is proposed to improve the robustness in distributed noise speech recognition in ETSI (European Telecommunications Standards Institute )-DSR (Distributed Speech Recognition)-AFE(Advanced Front-End)standard. The speech is clustered in an unsupervised with a likelihood measurement. The level-building process for dividing speech at each level is introduced to reduce the computational load. Therefore, the boundaries of voice and non-voice data are precisely detected. Experiments have demonstrated that performance of this proposed methodology shows improvement by 18.9% in ETSI-DSRAFE standard when the SNR of speech is greater than 0 dB. The recognition rate is also improved by 60.7% in comparison with that of Mel-frequenly Ceptral coefficients(MFCC) system.

关 键 词:语音信号处理 说话人识别 分布式语音识别 分层增长 语音活动检测 似然距离 

分 类 号:TN912.34[电子电信—通信与信息系统] TP391.4[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象