检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Ziwang FU Feng LIU Qing XU Xiangling FU Jiayin QI
机构地区:[1]School of Computer Science(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China [2]Key Laboratory of Trustworthy Distributed Computing and Service(BUPT),Beijing 100876,China [3]Shanghai International School of Chief Technology Officer,East China Normal University,Shanghai 200062,China [4]School of Computer Science and Technology,East China Normal University,Shanghai 200062,China [5]School of Cyberspace Security,Guangzhou University,Guangdong 510006,China
出 处:《Frontiers of Computer Science》2024年第4期39-47,共9页中国计算机科学前沿(英文版)
基 金:National Natural Science Foundation of China(Grant No.72293583).
摘 要:Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition.Existing approaches use directional pairwise attention or a message hub to fuse language,visual,and audio modalities.However,these fusion methods are often quadratic in complexity with respect to the modal sequence length,bring redundant information and are not efficient.In this paper,we propose an efficient neural network to learn modality-fused representations with CB-Transformer(LMR-CBT)for multimodal emotion recognition from unaligned multi-modal sequences.Specifically,we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences.Then,we design an innovative asymmetric transformer with cross-modal blocks(CB-Transformer)that enables complementary learning of different modalities,mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations.In addition,we splice the fused features with the original features to classify the emotions of the sequences.Finally,we conduct word-aligned and unaligned experiments on three challenging datasets,IEMOCAP,CMU-MOSI,and CMU-MOSEI.The experimental results show the superiority and efficiency of our proposed method in both settings.Compared with the mainstream methods,our approach reaches the state-of-the-art with a minimum number of parameters.
关 键 词:modality-fused representations cross-model blocks multimodal emotion recognition unaligned multimodal sequences computational affection
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.13.165