多头注意机制的多粒度文本-图像对齐  

Multi Granularity Text-Image Alignment Based on Multi-Head Attention Mechanism

在线阅读下载全文

作  者:王红斌[1,2] 张盼盼[1,2] 李华锋 WANG Hongbin;ZHANG Panpan;LI Huafeng(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500

出  处:《昆明理工大学学报(自然科学版)》2023年第1期42-52,共11页Journal of Kunming University of Science and Technology(Natural Science)

基  金:国家自然科学基金项目(61966020,61966021).

摘  要:基于文本的人物图像搜索任务存在文本和图像细粒度特征提取以及消除文本-图像模态间差距等方面的挑战.本文针对全局特征不足以表示全面的文本和图像模态特征的问题,提出了多头注意机制的多粒度文本-图像对齐方法,该方法引入多头注意力机制,在考虑全局匹配的基础之上,同时考虑局部图像特征和局部文本特征之间的匹配,并对局部图像特征和局部文本特征应用多头注意力机制,来获取文本和图像模态内的关系信息,提出模态间关系模块来获取两个模态之间的关系信息,使提取到的局部图像特征和局部文本特征自适应地对齐,从而提升基于文本的人物图像搜索任务的整体效果.在公共数据集CUHK-PEDES上进行了实验验证,模型的总体性能较baseline提高了3.0%,由此表明本文提出的模型在基于文本的人物图像搜索任务中的有效性.Text-based person image search task has many challenges,such as text and image fine-grained feature extraction,and eliminating the gap between text and image modes.Aiming at the problem that global features are not enough to represent comprehensive text and image modal features,a multi granularity text-image alignment method based on multi-head attention mechanism is proposed,which introduces multi-head attention mechanism,considers the matching between local image features and local text features on the basis of considering the global matching,and applies the multi-head attention mechanism to the local image features and local text features to obtain the relationship information in the text and image modes.The relationship module is proposed to obtain the relationship information between the two modes,so as to make the extracted local image features and local text features align adaptively and improve the overall effect of text-based person search task.Experiments on the public datasets of CUHK-PEDES show that the overall performance of the model is 3.0%higher than baseline,which proves the effectiveness of the proposed model in the text-based person image search task.

关 键 词:跨模态匹配 全局匹配 多头注意力机制 局部图像特征 局部文本特征 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象