一种基于CNN/CTC的端到端普通话语音识别方法  被引量:3

An End-to-End Mandarin Speech Recognition Method Based on CNN/CTC

在线阅读下载全文

作  者:潘粤成 刘卓 潘文豪 蔡典仑 韦政松 PAN Yuecheng;LIU Zhuo;PAN Wenhao;CAI Dianlun;WEI Zhengsong(School of Automation Science and Engineering,South China University of Technology,Guangzhou 510641,China;School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510641,China)

机构地区:[1]华南理工大学自动化科学与工程学院,广东广州510641 [2]华南理工大学机械与汽车工程学院,广东广州510641

出  处:《现代信息科技》2020年第5期65-68,共4页Modern Information Technology

基  金:国家级大学生创新创业训练计划项目(201910561167)。

摘  要:为了实现离线状态较高正确率的中文普通话语音识别,提出一种基于深度全卷积神经网络CNN表征的语音识别系统的声学模型,将频谱图作为输入,在模型结构上参考了VGG模型。在输出端,该模型可以与连接时序分类完美结合,从而实现整个模型的端到端训练,将声波信号转换成普通话拼音序列。语言模型则采用最大熵马尔可夫模型,将拼音序列转换为中文文本。实验表明,此算法在测试集上已经获得了80.82%的正确率。In order to achieve Mandarin speech recognition with higher accuracy in offline state,we come up with an acoustic model of a speech recognition system based on deep full convolutional neural network(CNN).We choose the spectrogram of acoustic signals as input.As for the structure of the model,we refer the VGG model.At the output end,the model can be perfectly combined with the connectionist temporal classification(CTC).We realize the end-to-end training of the entire model using this method,and the acoustic signal is directly converted into a Mandarin Pinyin sequence.Our language model uses the Maximum Entropy Markov Model to convert Pinyin sequences into Chinese text.Our experiments show that this algorithm has achieved 80.82%accuracy on our test set.

关 键 词:卷积神经网络 中文语音识别 连接时序分类 端到端系统 

分 类 号:TN912.34[电子电信—通信与信息系统] TP399[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象