基于韵律特征辅助的端到端语音识别方法  

End-to-end speech recognition method based on prosodic features

在线阅读下载全文

作  者:刘聪 万根顺 高建清 付中华 LIU Cong;WAN Genshun;GAO Jianqing;FU Zhonghua(AI Institute,iFLYTEK Company Limited,Hefei Anhui 230088,China;Xi’an iFLYTEK Hyper-brain Information Technology Company Limited,Xi’an Shaanxi 710000,China)

机构地区:[1]科大讯飞股份有限公司AI研究院,合肥230088 [2]西安讯飞超脑信息科技有限公司,西安710000

出  处:《计算机应用》2023年第2期380-384,共5页journal of Computer Applications

基  金:科技创新2030-“新一代人工智能”重大项目(2020AAA0103600)。

摘  要:针对传统的语音识别系统采用数据驱动并利用语言模型来决策最优的解码路径,导致在部分场景下的解码结果存在明显的音对字错的问题,提出一种基于韵律特征辅助的端到端语音识别方法,利用语音中的韵律信息辅助增强正确汉字组合在语言模型中的概率。在基于注意力机制的编码-解码语音识别框架的基础上,首先利用注意力机制的系数分布提取发音间隔、发音能量等韵律特征;然后将韵律特征与解码端结合,从而显著提升了发音相同或相近、语义歧义情况下的语音识别准确率。实验结果表明,该方法在1000 h及10000 h级别的语音识别任务上分别较端到端语音识别基线方法在准确率上相对提升了5.2%和5.0%,进一步改善了语音识别结果的可懂度。In the traditional speech recognition system,the optimal decoding paths are determined by a language model restrained by the training data.Almost inevitably,the right pronunciation may produce wrong character recognition results in some scenarios.In order to use the prosodic information in speech to enhance the probability of correct character combination in language model,an end-to-end speech recognition method based on prosodic features was proposed.Based on the attention mechanism based encoder-decoder speech recognition framework,firstly,the coefficient distribution of attention mechanism was used to extract prosodic features such as pronunciation interval and pronunciation energy.Then,the prosodic features were combined with decoder to significantly improve the accuracy of speech recognition in the cases with the same or similar pronunciation and semantic ambiguity.Experimental results show that the proposed method achieves a relative accuracy improvement of 5.2% and 5.0% respectively compared with the baseline end-to-end speech recognition method on 1000 h and 10000 h speech recognition tasks and improves the intelligibility of speech recognition results.

关 键 词:语音识别 端到端 语义歧义 注意力机制 韵律特征 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象