检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:董永峰 苏海洋 刘斌[2] 陶建华 DONG Yongfeng;SU Haiyang;LIU Bin;TAO Jianhua(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401 [2]中国科学院自动化研究所模式识别实验室,北京100190
出 处:《信号处理》2021年第5期885-892,共8页Journal of Signal Processing
基 金:国家重点研发计划(2017YFB1002804);国家自然科学基金重点项目(61831022,61771472,61901473,61902106);天津市自然科学基金(19JCZDJC40000);河北省自然科学基金(F2020202028)。
摘 要:近年来,情感识别成为了人机交互领域的研究热点问题,而多模态维度情感识别能够检测出细微情感变化,得到了越来越多的关注多模态维度情感识别中需要考虑如何进行不同模态情感信息的有效融合。针对特征层融合存在有效特征提取和模态同步的问题、决策层融合存在不同模态特征信息的关联问题,本文采用模型层融合策略,提出了基于多头注意力机制的多模态维度情感识别方法,分别构建音频模型、视频模型和多模态融合模型对信息流进行深层特征学习,最后放入双向长短时网络中得到最终情感预测值。所提方法相比于不同基线方法在激活度和愉悦度上均取得了最佳的性能,可以在高层维度对情感信息有效捕捉,进而更好的对音视频信息进行有效融合。In recent years,emotion recognition had become a hot research topic in the field of human-computer interaction,and multi-modal dimensional emotion recognition could detect subtle emotional changes,which had attracted more and more attention.In multi-modal emotion recognition,it was necessary to consider how to effectively integrate different modal emotion information.Aiming at the problem of effective feature extraction and modal synchronization in feature level fusion,and the correlation problem of different modal feature information in decision level fusion,this paper adopted a model level fusion strategy and proposes a multi-modal dimension emotion recognition method based on Transformer.Respectively constructed audio model,video model and multi-modal fusion model to learn the deep features of the information flow,and finally put it into Bi-directional Long Short Term Memory to obtain the final emotional prediction value.Compared with different baseline methods,the proposed method achieves the best performance in terms of arousal and valence,and could effectively capture emotional information in high-level dimensions,and thus better effectively integrate audio and video information.
关 键 词:维度情感识别 多模态情感融合 模型层融合 多头注意力机制
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222