基于深度学习的跨模态检索综述  被引量:20

Survey on deep learning based cross-modal retrieval

在线阅读下载全文

作  者:尹奇跃 黄岩[1] 张俊格[1] 吴书[1] 王亮[1] Yin Qiyue;Huang Yan;Zhang Junge;Wu Shu;Wang Liang(Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]中国科学院自动化研究所,北京100190

出  处:《中国图象图形学报》2021年第6期1368-1388,共21页Journal of Image and Graphics

摘  要:由于多模态数据的快速增长,跨模态检索受到了研究者的广泛关注,其将一种模态的数据作为查询条件检索其他模态的数据,如用户可以用文本检索图像或/和视频。由于查询及其检索结果模态表征的差异,如何度量不同模态之间的相似性是跨模态检索的主要挑战。随着深度学习技术的推广及其在计算机视觉、自然语言处理等领域的显著成果,研究者提出了一系列以深度学习为基础的跨模态检索方法,极大缓解了不同模态间相似性度量的挑战,本文称之为深度跨模态检索。本文从以下角度综述有代表性的深度跨模态检索论文,基于所提供的跨模态信息将这些方法分为3类:基于跨模态数据间一一对应的、基于跨模态数据间相似度的以及基于跨模态数据语义标注的深度跨模态检索。一般来说,上述3类方法提供的跨模态信息呈现递增趋势,且提供学习的信息越多,跨模态检索性能越优。在上述不同类别下,涵盖了7类主流技术,即典型相关分析、一一对应关系保持、度量学习、似然分析、学习排序、语义预测以及对抗学习。不同类别下包含部分关键技术,本文将具体阐述其中有代表性的方法。同时对比提供不同跨模态数据信息下不同技术的区别,以阐述在提供了不同层次的跨模态数据信息下相关技术的关注点与使用异同。为评估不同的跨模态检索方法,总结了部分代表性的跨模态检索数据库。最后讨论了当前深度跨模态检索待解决的问题以及未来的研究方向。Over the last decade,different types of media data such as texts,images,and videos grow rapidly on the internet.Different types of data are used for describing the same events or topics.For example,a web page usually contains not only textual description but also images or videos for illustrating the common content.Such different types of data are referred as multi-modal data,which inspire many applications,e.g.,multi-modal retrieval,hot topic detection,and perso-nalize recommendation.Nowadays,mobile devices and emerging social websites(e.g.,Facebook,Flickr,YouTube,and Twitter)are diffused across all persons,and a demanding requirement for cross-modal data retrieval is emergent.Accordingly,cross-modal retrieval has attracted considerable attention.One type of data is required as the query to retrieve relevant data of another type.For example,a user can use a text to retrieve relevant pictures or/and videos.The query and its retrieved results can have different modalities;thus,measuring the content similarity between different modalities of data,i.e.,reducing heterogeneity gap,remains a challenge.With the rapid development of deep learning techniques,various deep cross-modal retrieval approaches have been proposed to alleviate this problem,and promising performance has been obtained.We aim to review and comb representative methods for deep learning based cross-modal retrieval.We first classify these approaches into three main groups based on the cross-modal information provided,i.e.:1)co-occurrence information,2)pairwise information,and 3)semantic information.Co-occurrence information based methods indicate that only co-occurrence information is utilized to learn common representations across multi-modal data,where co-occurrence information indicates that if different modalities of data co-exist in a multi-modal document,then they have the same semantic.Pairwise information based methods indicate that similar pairs and dissimilar pairs are utilized to learn the common representations.A similarity matrix for all mo

关 键 词:跨模态检索 跨模态哈希 深度学习 共同表示学习 对抗学习 似然分析 学习排序 

分 类 号:TP37[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象