多模态深度学习综述  被引量:43

Survey of multimodal deep learning

在线阅读下载全文

作  者:刘建伟[1] 丁熙浩 罗雄麟[1] Liu Jianwei;Ding Xihao;Luo Xionglin(Dept.of Automation,China University of Petroleum,Beijing 102249,China)

机构地区:[1]中国石油大学(北京)自动化系,北京102249

出  处:《计算机应用研究》2020年第6期1601-1614,共14页Application Research of Computers

摘  要:在多模态深度学习发展前期总结当前多模态深度学习,发现在不同多模态组合和学习目标下,多模态深度学习实现过程中的共有问题,并对共有问题进行分类,叙述解决各类问题的方法。具体来说,从涉及自然语言、视觉、听觉的多模态学习中考虑了语言翻译、事件探测、信息描述、情绪识别、声音识别和合成以及多媒体检索等方面研究,将多模态深度学习实现过程中的共有问题分为模态表示、模态传译、模态融合和模态对齐四类,并对各类问题进行子分类和论述,同时列举了为解决各类问题产生的神经网络模型。最后论述了实际多模态系统、多模态深度学习研究中常用的数据集和评判标准,并展望了多模态深度学习的发展趋势。This paper aimed to summarize the current multimodal deep learning,found common problems in the implementation of multimodal deep learning under different multimodal and learning objectives,as well as made common problems classify and described methods for solving various problems at the early development of multimodal deep learning. Specifically,this paper summarized the current multimodal deep learning that studied on natural language,visual,auditory,and considered the research direction such as language translation,event detection,information description,emotion recognition,voice recognition and synthesis,and multimedia retrieval and so on,which further concluded that there were four types of common problems: multimodal representation,multimodal interpretation,multimodal fusion and multimodal alignment. Meanwhile,this paper sub-categorized and discussed each common multimodal learning problem,and listed the neural network models generated for solving the problems. Finally,it introduced some actual multimodal system,listed baseline datasets and evaluation criteria used in multimodal deep learning,and prospected the development directions for future research.

关 键 词:多模态 深度学习 多神经网络 多模态表示 多模态传译 多模态融合 多模态对齐 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象