检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李倩倩 王卫星[1] 杨勤[1] 陈治灸 秦晴 LI Qianqian;WANG Weixing;YANG Qin;CHEN Zhijiu;QIN Qing(School of Mechanical Engineering,Guizhou University,Guiyang 550025)
出 处:《计算机与数字工程》2023年第3期695-699,共5页Computer & Digital Engineering
基 金:贵州省科学技术基金项目“基于深度图像的原生态民族舞蹈典型动作识别研究”(编号:黔科合基础[2020]1Y262);贵州省教育厅青年科技人才成长项目“基于语言值计算的数字动漫产品情感化配乐技术研究”(编号:黔教合KY字[2018]112);贵州大学引进人才项目“基于语义驱动的音乐与图像情感识别技术研究”(编号:贵大人基合字(2018)16号)资助。
摘 要:情感在同一情境下通常是逐渐变化的,而目前视听情感识别研究大部分集中在融合静态人脸图像特征和语音特征上,忽略了视频图像序列之间的时序关系,也忽略了姿态的作用。因此论文结合卷积神经网络(VGG)和长短期记忆网络(LSTM)构建了一个基于深度神经网络的视听多模态情感识别模型,整合了表情、姿态和语音的特征来进行视听情感识别。首先,使用VGG提取人脸图像和姿态图像的视觉特征,然后使用LSTM提取人脸图像序列和姿态图像序列的时序特征,同时使用opensmile提取音频特征,最后将提取的人脸、姿态和音频特征用DNN网络进行多特征的拼接融合以及情感分类。实验证明,与融合静态人脸图像特征与语音特征进行视听情感识别的方法相比,论文模型取得了更好的识别率,而加上姿态特征后,准确率又提升了6.1%。Emotions usually change gradually in the same context.At present,most research on audio-visual emotion recogni⁃tion focuses on the fusion of static facial image features and voice features,ignoring the temporal relationship between video image sequences and the role of gestures.Therefore,this paper combines convolutional neural network(VGG)and long short-term memo⁃ry network(LSTM)to construct an audio-visual multimodal emotion recognition model based on deep neural network,which inte⁃grates the features of expression,posture and speech to perform audio-visual emotion recognition.Firstly,VGG is used to extract the visual features of face images and pose images,then LSTM is used to extract the time series features of face image sequences and pose image sequences,and opensmile is used to extract audio features.Finally,the extracted face,pose,and audio features DNN network performs multi-feature splicing and fusion and emotion classification are used.Experiments show that compared with the method of fusing static facial image features and voice features for audiovisual emotion recognition,the model in this paper achieves a better recognition rate,and after adding gesture features,the accuracy rate is increased by 6.1%.
关 键 词:深度学习 情感识别 视觉特征 时序特征 特征融合
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.110.128