检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冀常鹏[1] 佟婷婷 代巍 JI Changpeng;TONG Tingting;DAI Wei(School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China)
机构地区:[1]辽宁工程技术大学电子与信息工程学院,葫芦岛125105
出 处:《应用声学》2024年第4期892-899,共8页Journal of Applied Acoustics
基 金:辽宁省科技厅项目(2019-ZD-0038)。
摘 要:在语声情感识别过程中,为解决缺乏方言数据库、识别模型准确率低等问题,建立辽西方言语声情感数据库,并提出一种融合注意力机制轻量级网络的语声情感识别模型。模型由特征组合网络、CBAM注意力机制、深度卷积网络及输出层四部分组成。利用3个大小不同的并行卷积提取浅层语声特征并进行拼接;引入CBAM注意力模块将空间特征与通道特征融合;融合后的特征输入深度卷积网络,提取语声深层次特征,输出多维特征向量;输出层对语声进行情感分类识别。模型在IEMOCAP、Emo-DB和自建辽西语声情感数据库上验证,分别取得82.5%、96.2%和90.8%的准确率。实验结果表明,与其他深度学习的模型相比,该文提出的模型在参数量更少的同时识别率更高。In the process of speech emotion recognition,to solve the problems of lack of dialect database and low accuracy of recognition model,a speech emotion database of Liaoxi dialect was established,and a speech emotion recognition model integrating attention mechanism lightweight network was proposed.The model consists of four parts:feature combination network,CBAM attention mechanism,deep convolutional network,and output layer.Three parallel convolutions with different sizes are used to extract the shallow speech features and concatenate them.The CBAM attention module is introduced to refine the input features.The fused feature input deep convolutional network extracts the deep feature of speech and outputs the multi-dimensional feature vector;The output layer classifies and recognizes speech emotion.The model was verified on IEMOCAP,Emo-DB,and Liaoxi dialect speech emotion database,and the accuracy rates were 82.5%,96.2%,and 90.8%,respectively.Experimental results show that compared with other deep learning models,the proposed model has fewer parameters and a higher recognition rate.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.185.226