深度神经网络技术在汉语语音识别声学建模中的优化策略被引量：5

Optimization of deep neural network in acoustic modeling for mandarin speech recognition

机构地区：[1]中国科学院语言声学与内容理解重点实验室,北京100190

出　　处：《重庆邮电大学学报（自然科学版）》2014年第3期373-379,共7页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基　　金：国家自然科学基金(10925419;90920302;61072124;11074275;11161140319;91120001;61271426);中国科学院战略性先导科技专项(XDA06030100;XDA06030500);国家"863"计划(2012AA012503);中科院重点部署项目(KGZD-EW-103-2)~~

摘　　要：将深度神经网络作为声学模型引入面向汉语电话自然口语交谈语音识别系统。针对自然口语中识别字错误率较高的问题,从语音的声学特征类型选择、模型训练时元参数调节以及改善模型泛化能力等方面出发,对基于深度神经网络的声学模型建模技术进行了一系列的优化。针对训练样本中状态先验概率分布稀疏的情况,提出了一种状态先验概率平滑算法,在一定程度上缓解了这种数据稀疏问题,经平滑后,字错误率下降超过1%。在所采用的3个电话自然口语交谈测试集上,相对于优化前的深度神经网络模型,经过优化后的模型取得了性能的一致提升,字错误率平均相对降低15%。实验结果表明,所采用优化策略可以有效地改善深度神经网络声学模型性能。The deep neural network （DNN） as acoustic model is introduced into the Mandarin Conversational Telephone Speech recognition system. Firstly, as the character error rate is high for the spontaneous speech recognition, started from the acoustic feature type selection, meta - parameters tuning during training and the optimization of the model generalization capability, a series of optimizations have been implemented to the DNN based acoustic modeling. Secondly, a smoothing al- gorithm is proposed for the sparse distribution of the states prior probabilities in the training samples, with this algorithm the character error rate is reduced by 1% absolutely. And finally, on our three conversational telephone speech test sets, the optimized - DNN model achieves a consistent performance enhancement over the baseline-DNN model, the average relative character error rate decreases by 15%. This experimental resuhs demonstrate that these optimized strategies can improve the performance of the DNN based acoustic models.

关键词：深层神经网络语音识别隐马尔科夫模型概率平滑

分类号：TP18[自动化与计算机技术—控制理论与控制工程] TP391[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度神经网络技术在汉语语音识别声学建模中的优化策略被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度神经网络技术在汉语语音识别声学建模中的优化策略 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

深度神经网络技术在汉语语音识别声学建模中的优化策略被引量：5