检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:师硕 覃嘉俊 于洋 郝小可 SHI Shuo;QIN Jia-jun;YU Yang;HAO Xiao-ke(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China)
机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401
出 处:《电子学报》2024年第8期2824-2835,共12页Acta Electronica Sinica
基 金:国家自然科学基金(No.61806071,No.62102129);河北省自然科学基金(No.F2020202025,No.F2021202030)。
摘 要:视听双模态情感识别是情感计算领域的研究热点.目前情感识别方法存在无法同时提取视频局部和全局特征,多模态数据融合简单,损失函数在模型优化中无法关注错分样本等问题,导致情感识别结果精确度不高.本文提出一种基于改进的ConvMixer和动态权重焦点损失函数的视听情感识别方法.采用空间和时间邻接矩阵代替ConvMixer中的深度分离卷积,提取视频时域空域上的全局和局部特征.提出跨模态时间注意力模块,以对称结构捕捉模态间的时间相关性,提高特征融合效果.结合混淆矩阵计算具有动态权重的焦点损失函数,差异化地加大错分样本在损失中的占比,优化模型参数.在公开数据集上的实验结果表明,本文方法能提取到代表性特征,可有效优化网络结构,提高了情感识别的准确率.Audio-visual bimodal emotion recognition is a research hotspot in the field of emotion computing.At pres⁃ent,emotion recognition methods cannot simultaneously extract local and global features of video,multi-modal data fusion is simple,loss function can not pay attention to misclassification of samples in model optimization,resulting in low accura⁃cy of emotion recognition results.In this paper,an audio-visual emotion recognition method based on improved ConvMixer and focus loss function with dynamic weight is proposed.Spatial and temporal adjacent matrices were used instead of deep separation convolution in ConvMixer to extract global and local features in video spatial and temporal domain.A crossmodal temporal attention module is proposed to capture the temporal correlation between modals with a symmetrical struc⁃ture to improve the feature fusion effect.The focus loss function with dynamic weight was calculated by the confusion ma⁃trix,and the proportion of error samples in the loss was increased differentially to optimize the model parameters.Experi⁃mental results on public data sets show that the proposed method can extract representative features,optimize the network structure effectively,and improve the accuracy of emotion recognition.
关 键 词:情感识别 ConvMixer 注意力机制 多模态特征融合 焦点损失函数
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.249.140