典型概念驱动的模态缺失深度跨模态检索  

Typical Concept-Driven Modality-Missing Deep Cross-Modal Retrieval

在线阅读下载全文

作  者:夏鑫雨 朱磊 聂秀山 董国华 张化祥[1] Xia Xinyu;Zhu Lei;Nie Xiushan;Dong Guohua;Zhang Huaxiang(School of Information Science and Engineering,Shandong Normal University,Jinan 250358;School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250101;Institute of Military Cognition and Brain Sciences,Beijing 100850)

机构地区:[1]山东师范大学信息科学与工程学院,济南250358 [2]山东建筑大学计算机科学与技术学院,济南250101 [3]军事认知与脑科学研究所,北京100850

出  处:《计算机辅助设计与图形学学报》2025年第3期519-532,共14页Journal of Computer-Aided Design & Computer Graphics

基  金:国家自然科学基金(62172263);山东省自然科学基金(ZR2020YQ47,ZR2019QF002);山东省高等学校青年创新团队基金(2019KJN040).

摘  要:跨模态检索使用一种模态的数据作为查询条件,在另一种模态中检索语义相关的数据.绝大多数的跨模态检索方法仅适用于模态完备条件下的跨模态检索场景,它们对缺失模态数据的处理能力仍有待提升,为此,提出一种典型概念驱动的模态缺失深度跨模态检索模型.首先提出一个融合多模态预训练网络的多模态Transformer模型,能在模态缺失的情况下充分地进行多模态细粒度语义交互,提取多模态融合语义并构造跨模态子空间,同时引导学习生成多模态典型概念;然后使用典型概念作为跨注意力的键和值来驱动模态映射网络的训练,使模态映射网络可以自适应地感知查询模态数据中隐含的多模态语义概念,生成跨模态检索特征,充分地保留训练提取的多模态融合语义.在Wikipedia,Pascal-Sentence,NUS-WIDE和XmediaNet这4个基准跨模态检索数据集上的实验结果表明,所提模型比文中对比模型的平均准确率均值分别提高了1.7%,5.1%,1.6%和5.4%.该模型的源代码可在https://gitee.com/MrSummer123/CPCMR网站获得.Cross-modal retrieval takes one modality data as a query and retrieves semantically relevant data in another modality.Most existing cross-modal retrieval methods are designed for scenarios with complete modality data.However,in real-world applications,incomplete modality data often exists,which these methods struggle to handle effectively.In this paper,we propose a typical concept-driven modality-missing deep cross-modal retrieval model.Specifically,we first propose a multi-modal Transformer integrated with multi-modal pretraining networks,which can fully capture the multi-modal fine-grained semantic interaction in the incomplete modality data,extract multiodal fusion semantics and construct cross-modal subspace,and at the same time supervise the learning process to generate typical concepts.In addition,the typical concepts are used as the cross-attention key and value to drive the training of the modal mapping network,so that it can adaptively preserve the implicit multi-modal semantic concepts of the query modality data,generate cross-modal retrieval features,and fully preserve the pre-extracted multi-modal fusion semantics.Experimental results on four benchmark cross-modal retrieval datasets—Wikipedia,Pascal-Sentence,NUS-WIDE,and XmediaNet—show that our proposed method outperforms the existing baseline models in the paper,with average precision improvements of 1.7%,5.1%,1.6%,and 5.4%,respectively.The source code of our method is available at:https://gitee.com/MrSummer123/CPCMR.

关 键 词:深度跨模态检索 缺失模态 多模态Transformer 典型概念 模态映射网络 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象