基于语音和视频动态特征的双模态情感识别

Bimodal Emotion Recognition Based on Dynamic Features of Speech and Video

作　　者：刘浠辰姜囡杜扶遥 LIU Xi-chen;JIANG Nan;DU Fu-yao(College of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang Liaoning 110854,China;Key Laboratory of Evidence Science,Ministry of Education,China University of Political Science and Law,Beijing 100088,China)

机构地区：[1]中国刑事警察学院公安信息技术与情报学院,辽宁沈阳110854 [2]中国政法大学证据科学教育部重点实验室,北京100088

出　　处：《计算机仿真》2025年第2期215-220,共6页Computer Simulation

基　　金：公安学科基础理论研究创新计划项目(安全防范技术与工程基础理论与学科体系研究2022XKGJ0110);辽宁省科技厅联合开放基金机器人学国家重点实验室开放基金资助项目(2020-KF-12-11);证据科学教育部重点实验室(中国政法大学)开放基金资助课题(2021KFKT09);中央高校基本科研业务费专项资金资助(3242019010);辽宁省自然科学基金项目(2019-ZD-0168);教育部重点研究项目(E-AQGABQ20202710)。

摘　　要：针对语音单模态情感识别特征缺失等问题,提出了一种基于语音和视频动态特征融合的双模态情感识别方法,解决了基于图像静态特征进行情感识别导致时序特征缺失的问题。由于视频中人体动作能够充分反映情绪特征,重点提取了人体动作的深层特征作为视频动态特征。调整MFCC系数数量,进行语音特征数量对情感识别的差异性影响分析。基于MFCC和基频混合特征输人双向LSTM网络获取语音深层特征。基于IEMOCAP数据集,将两种单模态特征情感识别与所提出的双模态情感识别方法进行对比分析。结果表明,所提出的双模态动态特征方法识别率分别提高了9.6%和21.1%,当MFCC系数数量优化为40时,识别率均有显著提高。A bimodal emotion recognition method based on the fusion of speech and video dynamic features is proposed to solve the problem of feature missing in speech monomodal emotion recognition.And solve that problem of time sequence feature loss cause by emotion recognition based on image static features.Because the human action in the video can fully reflect the emotional characteristics,the deep features of human action are extracted as the dynamic features of the video.The number of MFCC coefficients is adjusted to analyze the different influence of the number of speech features on emotion recognition.Acquire speech deep feature based on MFCC and pitch mix feature inputting bidirectional LSTM network.Based on the IEMOcap data set,the two kinds of single-modal feature emotion recognition and the proposed bimodal emotion recognition method are compared and analyzed.The results show that the recognition rates of the proposed dual-mode dynamic feature method are increased by 9.6%and 21.1%,respectively.The results show that the proposed bimodal dynamic feature method has improved recognition rates by 9.6%and 21.1%,respectively.When the number of MFCC coefficients is optimized to 40,the recognition rates are significantlyimproved.

关键词：双模态视频动态特征语音特征特征融合情感识别

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语音和视频动态特征的双模态情感识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语音和视频动态特征的双模态情感识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索