基于阅读策略和语义对齐的图文匹配方法  

Image-text matching based on reading strategy and semantic alignment

在线阅读下载全文

作  者:甘凤梅 夏英 GAN Fengmei;XIA Ying(Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism,Chongqing 400065,P.R.China;School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China)

机构地区:[1]旅游多源数据感知与决策技术文旅部重点实验室,重庆400065 [2]重庆邮电大学计算机科学与技术学院,重庆400065

出  处:《重庆邮电大学学报(自然科学版)》2025年第1期67-75,共9页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基  金:国家自然科学基金项目(41971365);重庆市教委重点合作项目(HZ2021008);文化和旅游部重点实验室资助项目(E020H2023005)。

摘  要:针对跨媒体计算领域中的图文匹配任务,提出一种基于阅读策略和语义对齐的图文匹配方法(reading-strategy and semantic alignment network,RSAN)。设计基于Transformer和双向门控循环单元(bidirectional gated recurrent unit,Bi-GRU)的区域特征增强模块,生成具有语义关系的图像区域特征以提升语义对齐的准确性;设计包含概述分支和精读分支的阅读模块,聚合全局对齐和局部对齐来学习更准确的匹配分数。在Flickr30K和MS-COCO数据集上开展综合实验,结果表明:RSAN模型相较于现有基线模型,在准确率和效率上具有良好的表现。To address the image-text matching task in the cross-media computing domain,this paper proposes a reading strategy and semantic alignment network(RSAN).A region feature enhancement module based on transformer and bidirectional gated recurrent units(Bi-GRU)is designed to generate image region features with semantic relationships,improving the accuracy of semantic alignment.A reading module containing an overview branch and a close-reading branch is designed to aggregate global and local alignments for learning more accurate matching scores.Comprehensive experiments conducted on the Flickr30K and MS-COCO datasets show that the RSAN model outperforms existing baseline models in both accuracy and efficiency.

关 键 词:图文匹配 特征增强 语义对齐 相似度计算 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象