检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴良庆 张栋[1] 李寿山[1] 陈瑛[2] WU Liang-qing;ZHANG Dong;LI Shou-shan;CHEN Ying(School of Computer Science&Technology,Soochow University,Suzhou,Jiangsu 215006,China;College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China)
机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]中国农业大学信息与电气工程学院,北京100083
出 处:《计算机科学》2019年第11期284-290,共7页Computer Science
基 金:国家自然科学基金(61331011,61375073)资助
摘 要:情绪分析是自然语言处理的一项基本任务,目前在单模态信息(文本)上的研究已经相当成熟。但是对于包含文本、图像和语音3种模态信息的多模态内容(如视频)来说,额外增加的模态信息让情绪分析变得更具挑战性。为了提升多模态情绪识别任务的性能,文中提出了一种基于多任务学习的神经网络方法,该方法在考虑模态内部信息的同时,充分结合了3种模态之间的联系。具体而言,首先对3种模态信息进行预处理,得到相应的特征表示;其次,分别为每个模态构建私有的双向LSTM,从而获得单模态的内部信息;分别为两两组合(文本-图像、文本-语音和图像-语音)的双模态信息构建共享的双向LSTM层,以学习双模态之间的动态交互信息;接着,为3种模态组合的信息构建一个共享的双向LSTM,从而捕捉3种模态之间的动态交互信息;最后,把网络层中得到的单模态的内部信息和多模态的动态交互信息进行融合,通过全连接层和Sigmoid层获取最终的情绪识别结果。在单模态实验中,相比于目前的最佳方法,所提方法在文本、图像和语音3个方面对所有情绪识别的效果分别平均提高了6.25%,0.75%和2.38%;在多模态实验中,该方法在情绪识别任务中达到了平均65.67%的准确率,相比其他基准方法有了明显的提升。Emotion analysis is a fundamental task of natural language processing(NLP),and the research on single modality(text modality)has been rather mature.However,for multi-modal contents such as videos which consist of three modalities including text,visual and acoustic modalities,additional modal information makes emotion analysis more challenging.In order to improve the performance of emotion recognition on multi-modal emotion datasets,this paper proposed a neural network approach based on multi-task learning.This approach simultaneously considers both intra-modality and inter-modality dynamics among three modalities.Specifically,three kinds of modality information are first preprocessed to extract the corresponding features.Secondly,private bidirectional LSTMs are constructed for each modality to acquire the intra-modality dynamics.Then,shared bidirectional LSTMs are built for modeling inter-modality dynamics,including bi-modal(text-visual,text-acoustic and visual-acoustic)and tri-modal interactions.Finally,the intra-modality dynamics and inter-modality dynamics obtained in the network are fused to get the final emotion recognition results through fully-connected layers and the Sigmoid layer.In the experiment of uni-modal emotion recognition,the proposed approach outperforms the state-of-the-art by 6.25%,0.75%and 2.38%in terms of text,visual and acoustic on average respectively.In addition,this approach can achieve average 65.67%in accuracy in multi-modal emotion recognition tasks,showing significant improvement compared with other baselines.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.102.52