检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王詠森 刘倩 刘立波 WANG Yongsen;LIU Qian;LIU Libo(College of Information Engineering,Ningxia University,Yinchuan,Ningxia 750021,China)
出 处:《中文信息学报》2025年第1期167-174,共8页Journal of Chinese Information Processing
基 金:宁夏回族自治区重点研发计划项目(2022BEG03073);国家自然科学基金(62262053);宁夏科技创新领军人才项目(2022GKLRLX03)。
摘 要:针对现有基于Conformer语音识别模型对时频特征提取能力不足、模型结构冗余和参数量较大的问题,该文提出一个基于非对称卷积和门控前馈神经网络的语音识别模型ACGFN。首先,采用不同感受野大小的非对称卷积对语音序列的时频特征进行多尺度融合下采样,在增强模型提取时频特征的能力的同时,有效降低了下采样过程中信息的损失;其次,引入门控前馈模块替换Conformer中的双半步前馈网络,降低网络参数量的同时精简了模型结构。实验结果表明,该方法在公共数据集AISHELL-1和aidatatang_200zh的测试集上字错误率分别为4.48%、4.28%,且参数量仅40.3M。相较对比方法,识别字错误率和参数量均有所降低。To address the insufficient ability of time-frequency feature extraction,redundant model structure and large number of parameters in existing Conformer speech recognition models,this paper proposes a speech recognition model based on asymmetric convolution and gated feedforward neural network(ACGFN).Firstly,the model employs asymmetric convolutions with different receptive field sizes to perform multi-scale fusion and downsampling of the time-frequency features in speech sequences,which effectively reduces information loss during the downsampling process while enhancing the capability to extract time-frequency features.Secondly,the gated feedforward module is introduced to replace the double half-step feedforward network in Conformer,reducing the number of network parameters and simplifying the model structure.Experimental results show that compared with other algorithms,the proposed method outperforms the baselines by achieving 4.48%and 4.28%character error rate(CER)on the public datasets AISHELL-1 and aidatatang_200zh,respectively,with only 40.3M parameters.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28