基于双重注意力机制的多尺度指代目标分割方法

Multi-Scale Referring Image Segmentation Based on Dual Attention

作　　者：胡梦楠[1,2,3] 王蓉张文靖[1] 张琪 Hu Mengnan;Wang Rong;Zhang Wenjing;Zhang Qi(School of Information and Cyber Security,People’s Public Security University of China,Beijing 100038;Department of Public Order,Shandong Police College,Jinan 250200;Public Security and Emergency Management Research Center,Shandong Police College,Jinan 250200)

机构地区：[1]中国人民公安大学信息网络安全学院,北京100038 [2]山东警察学院治安系,济南250200 [3]山东警察学院社会治安与应急管理研究中心,济南250200

出　　处：《计算机辅助设计与图形学学报》2025年第1期148-156,共9页Journal of Computer-Aided Design & Computer Graphics

基　　金：中国人民公安大学安全防范工程双一流专项(2023SYL08).

摘　　要：针对指代分割任务中视觉和语言间缺乏充分的跨模态交互、不同尺寸的目标空间和语义信息存在差异的问题,提出了基于双重注意力机制的多尺度指代目标分割方法.首先,利用语言表达中不同类型的信息关键词来增强视觉和语言特征的跨模态对齐,并使用双重注意力机制捕捉多模态特征间的依赖性,实现模态间和模态内的交互;其次,利用语言特征作为引导,从其他层次的特征中聚合与目标相关的视觉信息,进一步增强特征表示;然后利用双向ConvLSTM以自下而上和自上而下的方式逐步整合低层次的空间细节和高层次的语义信息;最后,利用不同膨胀因子的空洞卷积融合多尺度信息,增加模型对不同尺度分割目标的感知能力.此外,在UNC,UNC+,GRef和ReferIt基准数据集上进行实验,实验结果表明,文中方法在UNC,UNC+,GRef和ReferIt上的oIoU指标分别提高了1.81个百分点、1.26个百分点、0.84个百分点和0.32个百分点,广泛的消融研究也验证了所提方法中各组成部分的有效性.This paper proposes a multi-scale referring image segmentation method based on dual attention to solve the problem of insufficient interaction between visual and linguistic modes,as well as different structural and semantic information required by objects of different sizes.Firstly,the dual attention mechanism is used to realize the intermodal and intramodal interaction between vision and text,which enhances the ability to align visual and linguistic features accurately by using different types of information words in the expression.Secondly,using language features as guidance,useful features are selected from other levels for information exchange to further enhance feature representation.Then,dual path ConvLSTM is used to fully integrate low-level visual details and high-level semantics from bottom-up and top-down paths.Finally,multi-scale information is fused by atrous spatial pyramid pooling,increasing the perception ability of the model for different scales.Experiments on the UNC,UNC+,GRef,and ReferIt reference data sets show that the proposed method oIoU improves by 1.81 percentage points on UNC,1.26 percentage points on UNC+,0.84 percentage points on GRef,and 0.32 percent age points on ReferIt.Extensive ablation studies have also validated the effectiveness of each component of our approach.

关键词：指代目标分割跨模态交互特征增强注意力机制多尺度融合

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双重注意力机制的多尺度指代目标分割方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双重注意力机制的多尺度指代目标分割方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索