基于多语言语音数据选择的资源稀缺蒙语语音识别研究  被引量:1

Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection

在线阅读下载全文

作  者:张爱英 ZHANG Ai-ying(School of Mathematic and Quantitative Economics,Shandong University of Finance and Economics,Jinan 250014,China)

机构地区:[1]山东财经大学数学与数量经济学院,济南250014

出  处:《计算机科学》2018年第9期308-313,共6页Computer Science

基  金:国家自然科学基金(61305027);山东省自然科学基金(ZR2011FQ024);山东省高等学校科技计划项目(J17KB160)资助

摘  要:利用多语言信息可以提高资源稀缺语言识别系统的性能。但是,在利用多语言信息提高资源稀缺目标语言识别系统的性能时,并不是所有语言的语音数据对资源稀缺目标语言语音识别系统的性能提高都有帮助。文中提出利用长短时记忆递归神经网络语言辨识方法选择多语言数据以提高资源稀缺目标语言识别系统的性能;选出更加有效的多语言数据用于训练多语言深度神经网络和深度Bottleneck神经网络。通过跨语言迁移学习获得的深度神经网络和通过深度Bottleneck神经网络获得的Bottleneck特征都对提高资源稀缺目标语言语音识别系统的性能有很大的帮助。与基线系统相比,在插值的Web语言模型解码条件下,所提系统的错误率分别有10.5%和11.4%的绝对减少。The performance of low-resource speech recognition system is improved by the multilingual information.However,when the multilingual information is used to improve the performance of low-resource automatic speech re-cognition system,not all of the multilingual speech data could be utilized to improve the performance of low-resource automatic speech recognition system.In this paper,a data selection method which is based on long short-term memory recurrent neural network based language identification was proposed and used to improve the performance of low-resource automatic speech recognition system.More efficient multilingual speech data are selected and used to train multilingual deep neural network and deep Bottleneck neural network.The deep neural network model obtained by using transfer learning and the Bottleneck features extracted by using the deep bottleneck neural network are both helpful to improve the performance of low-resource target language speech recognition system.Comparing with the baseline system,there are 10.5%and 11.4%absolute word error rate reductions under the condition of interpolated web based language mo-del for decoding.

关 键 词:数据选择 资源稀缺 多语言深度神经网络 深度Bottleneck神经网络 

分 类 号:TP391.42[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象