基于遮蔽多头注意力的CTC-Conformer中文语音识别模型  

Combining CTC with transformer model for implementing Chinese speech recognition

在线阅读下载全文

作  者:黄天圆 王超 HUANG Tianyuan;WANG Chao(School of Information and Electrical Engineering,Hebei University of Engineering,Handan 056038,Hebei,China)

机构地区:[1]河北工程大学信息与电气工程学院,河北邯郸056038

出  处:《智能计算机与应用》2025年第2期162-167,共6页Intelligent Computer and Applications

基  金:河北省自然科学基金面上项目(A2020402013)。

摘  要:Conformer模型是语言处理任务中广泛应用的模型之一,其结合了Transformer模型和卷积神经网络的特点,既能捕捉到局部和全局的序列特征又能更好地理解输入数据的结构和上下文信息。然而,现有Conformer模型中的音频和文本之间对齐关系存在不确定性,同时模型采用的多头注意力还会将未来时间步输入信息泄漏到当前时间步。采用连接时序分类(Connectionist Temporal Classification, CTC)机制进行辅助训练,不仅可以提高基于Macaron-Net结构的Conformer模型鲁棒性,还可以解决音频和文本不对齐问题。在解码器部分,应用遮蔽多头自注意力机制以确保在t时刻模型无法查看未来时间步的输入信息,从而保证模型仅利用已生成的标记进行预测。实验结果表明,基于遮蔽多头注意力的CTC-Conformer模型相对于Conformer模型的字错率与损失率均有所下降,损失值最低达到了3.24。Conformer is one of the most widely used models for language processing tasks.It combines the features of Transformer and convolutional neural network,it can not only capture local and global sequence features,but also better understand the structure and context information of input data.On the one hand,in the current Conformer model,it is uncertain in the alignment between audio and text.On the other hand,the multi-attention will leak the input information of the future time step to the current time step.To solve the above problems,the connectionist temporal classification(CTC)is used to improve the robustness of the Conformer model based on Macaron-Net structure,and resolve the issue of audio and text misalignment.Furthermore,masking multi-head self-attention mechanism is applied,in the decoder part,to ensure that the model can not view the input information of future time step at T-moment,so that the model can only make predictions with the generated markers.The results show that both the word error rate and the loss rate of CTC-Conformer model based on masking multi-head attention are lower than that of Conformer model,the lowest loss rate is 3.24.

关 键 词:CONFORMER CTC 遮蔽多头注意力 语言处理 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象