基于分解门控注意力单元的高效Conformer模型  

Efficient Conformer Model Based on Factorized Gated Attention Unit

在线阅读下载全文

作  者:李宜亭 屈丹 杨绪魁 张昊 沈小龙 LI Yiting;QU Dan;YANG Xukui;ZHANG Hao;SHEN Xiaolong(College of Information Systems Engineering,PLA Strategic Support Force Information Engineering University,Zhengzhou 450001,China)

机构地区:[1]中国人民解放军战略支援部队信息工程大学信息系统工程学院,郑州450001

出  处:《计算机工程》2023年第5期73-80,共8页Computer Engineering

基  金:国家自然科学基金(62171470);河南省中原科技创新领军人才项目(234200510019);河南省自然科学基金面上项目(232300421240)。

摘  要:为利用有限的存储和计算资源,在保证Conformer端到端语音识别模型精度的前提下,减少模型参数量并加快训练和识别速度,构建一个基于分解门控注意力单元与低秩分解的高效Conformer模型。在前馈和卷积模块中,通过低秩分解进行计算加速,提高Conformer模型的泛化能力。在自注意力模块中,使用分解门控注意力单元降低注意力计算复杂度,同时引入余弦加权机制对门控注意力进行加权保证其向邻近位置集中,提高模型识别精度。在AISHELL-1数据集上的实验结果表明,在引入分解门控注意力单元和余弦编码后,该模型的参数量和语音识别字符错误率(CER)明显降低,尤其当参数量被压缩为Conformer端到端语音识别模型的50%后语音识别CER仅增加了0.34个百分点,并且具有较低的计算复杂度和较高的语音识别精度。To reduce the number of model parameters and accelerate the training and recognition speed while ensuring the accuracy of the Conformer end-to-end speech recognition model,an efficient Conformer model based on Factorized Gated Attention Unit(FGAU)and low rank decomposition is proposed in this study with limited storage and computing resources.In the feedforward and convolution modules,low rank decomposition is used to accelerate the calculation to improve the generalization ability of the Conformer model.In the self-attention module,the FGAU is used to reduce the computational complexity of attention.Meanwhile,cosine weighting mechanism is introduced to ensure that gated attentions are concentrated at the neighboring position to improve the recognition accuracy of the model.Experimental results obtained with the AISHELL-1 dataset indicate that after the introduction of FGAU and cosine coding,the number of parameters and speech recognition character error rate of the proposed model are significantly reduced compared with the number of parameters in the Conformer end-to-end speech recognition model.When the number of parameters is reduced to 50%of that used in the Conformer end-to-end speech recognition model,the speech recognition Character Error Rate(CER)increases by only 0.34 percentage points.This indicates that the proposed model has lower computational complexity and higher speech recognition accuracy.

关 键 词:端到端语音识别 Conformer模型 分解门控注意力单元 模型压缩 低秩分解 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象