基于改进DFSMN的非特定人语音识别模型  

A Non-Specific Person Speech Recognition Model Based on Improved DFSMN

在线阅读下载全文

作  者:王世刚 严瑾 WANG Shigang;YAN Jin(School of Automation,Guangxi University of Science and Technology,Liuzhou 545616,China)

机构地区:[1]广西科技大学自动化学院,广西柳州545616

出  处:《电声技术》2023年第12期111-114,共4页Audio Engineering

摘  要:深度前馈序列记忆网络(Deep Feedforward Sequential Memory Network,DFSMN)是一种识别准确率较高且在非特定人语音识别领域得到良好应用的声学模型,但存在参数冗余、模型训练困难的情况。针对此问题,提出一种基于改进DFSMN的非特定人语音识别模型。该模型改进了DFSMN记忆模块大小和模块之间的连接方式,并结合连接时序分类(Connectionist Temporal Classification,CTC)端到端语音识别框架。实验结果表明,在相同条件下,该改进模型的参数量较之前减少了约1/10,在不同数据集上与几种常见语音识别模型相比,其语音识别字符错误率均最低,在识别准确率和模型训练效率方面具有一定的优越性。Deep Feedforward Sequential Memory Network(DFSMN) is an acoustic model with high recognition accuracy and has been well applied in the field of non-specific speech recognition.However,this model suffers from parameter redundancy and difficulty in training.In response to this issue,this article proposes a non-specific person speech recognition model based on improved DFSMN.It improves the DFSMN memory unit structure and the connection between units.Meanwhile,it combines with the Connection Temporal Classification(CTC) end-to-end speech recognition framework.The experimental results show that under the same conditions,the number of parameters of the improved model has decreased by about 1/10 compared to before.At the same time,compared with several common speech recognition models on different datasets,its speech recognition word error rate is the lowest.It has certain advantages in recognition accuracy and model training efficiency.

关 键 词:语音识别 深度前馈序列记忆网络(DFSMN) 非特定人 连接时序分类(CTC) 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象