检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵小芬 张开生[1] ZHAO Xiaofen;ZHANG Kaisheng(School of Electrical and Control Engineering,Shanxi University of Science and Technology,Xi′an,Shanxi 710021,China)
机构地区:[1]陕西科技大学电气与控制工程学院,陕西西安710021
出 处:《石河子大学学报(自然科学版)》2022年第1期127-132,共6页Journal of Shihezi University(Natural Science)
基 金:国家自然科学基金(61601271);陕西省科技计划(2017GY-063)。
摘 要:目前说话人、环境及发音多样性仍是语音识别声学建模中需解决的主要难题,为了克服这些不利因素的影响,本文将经过三层结构优化后的卷积神经网络应用于语音识别,利用卷积神经网络的卷积不变性克服语音信号的多样性,采用更符合生物神经元特性的新型激活函数改进卷积层缓解梯度消失的问题;利用中间池化方法改进池化层、减小特征提取误差,使用卷积层代替全连接层的方式降低模型复杂度,再通过与对比方法进行多种指标评价,结果表明:本文提出的方法较对比算法在中文语音、英文语音两种数据集下平均识别错误率分别下降22.05%和20.27%。比传统卷积神经网络模型的损失值相对减小40%,在一定程度上提升了模型的泛化能力。At present, speaker, environment and pronunciation diversity are still the main problems to be solved in speech recognition acoustic modeling.In order to overcome the influence of these unfavorable factors, the three-layer structure optimized convolutional neural network is applied to speech recognition.Convolutional invariance of convolutional neural networks is used to overcome the diversity of speech signals.In order to alleviate the disappearance of the gradient, a new activation function that is more in line with the characteristics of biological neurons is used to improve the convolutional layer;So as to reduce the feature extraction error, the intermediate pooling method is uesd to improve the pooling layer;In order to reduce the complexity of the model, the convolutional layer is used instead of the fully connected layer, and a variety of indicators are evaluated by comparison.The results show that, compared with the comparison algorithm, the average recognition error rate in the Chinese speech and the English speech two data set drops by 22.05% and 20.27%;Compared with the traditional convolutional neural network model, the loss value is relatively reduced by 40%.To a certain extent, the generalization ability of the model is improved.
关 键 词:声学建模 三层结构优化 卷积神经网络 语音识别 识别率 泛化性能
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43