基于双交叉注意力Transformer网络的小样本图像语义分割

Dual cross-attention Transformer network for few-shot image semantic segmentation

作　　者：刘玉郭迎春朱叶于明 LIU Yu;GUO Yingchun;ZHU Ye;YU Ming(School of Electronic and Information Engineering,Hebei University of Technology,Tianjin 300401,China;School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China)

机构地区：[1]河北工业大学电子信息工程学院,天津300401 [2]河北工业大学人工智能与数据科学学院,天津300401

出　　处：《液晶与显示》2024年第11期1494-1505,共12页Chinese Journal of Liquid Crystals and Displays

基　　金：国家自然科学基金青年项目(No.62102129);国家自然科学基金面上项目(No.62276088);河北省自然科学基金(No.F2021202030,No.F2019202381,No.F2019202464)。

摘　　要：小样本图像语义分割只用少量样本就能分割出新类别。针对现有方法中语义信息挖掘不充分的问题,本文提出一种基于双交叉注意力网络的小样本图像语义分割方法。该方法采用Transformer结构,利用双交叉注意力模块同时从通道和空间维度上学习多尺度查询特征和支持特征的远程依赖性。首先,本文提出通道交叉注意力模块,并结合位置交叉注意力模块构成双交叉注意力模块。其中,通道交叉注意力模块用于学习查询和支持特征之间的通道语义相互关系,位置交叉注意力模块用来捕获查询和支持特征之间的远程上下文相关性。然后,通过多个双交叉注意力模块能够为查询图像提供包含丰富语义信息的多尺度交互特征。最后,本文引入辅助监督损失,并通过上采样和残差连接将多尺度交互特征连接至解码器以得到准确的新类分割结果。本文方法在数据集PASCAL-5i上的mIoU达到了69.9%(1-shot)和72.4%(5-shot),在数据集COCO-20i上的mIoU达到了48.9%(1-shot)和54.6%(5-shot)。与主流方法相比,本文方法的分割性能达到了最先进的水平。Few-shot semantic segmentation can segment novel classes with only few examples.To address the problem of insufficient semantic information mining in existing methods,a method based on Dual Cross-Attention Network for few-shot image semantic segmentation is proposed.The method adopts Transformer structure and uses dual cross-attention modules to explore the remote dependencies between multi-scale query and support features from both channel and spatial dimensions.Firstly,a channel cross-attention module is proposed in combination with the position cross-attention module to form a dual cross-attention module.Wherein,the channel cross-attention module is applied to learn the channel semantic interrelationships between the query and support features.The position cross-attention module is used to capture the remote contextual correlations between the query and support features.Then,multi-scale interaction features containing rich semantic information can be provided to the query image by multiple dual cross-attention modules.Finally,to obtain accurate segmentation results,auxiliary supervision loss is introduced,and these multi-scale interaction features are connected to the decoder via upsampled and residual connection.The proposed method achieves 69.9%(1-shot)and 72.4%(5-shot)mIoU on the dataset PASCAL-5i,and 48.9%(1-shot)and 54.6%(5-shot)mIoU on the dataset COCO-20i,which attains the state-of-the-art segmentation performance in comparison with mainstream methods.

关键词：小样本图像语义分割 Transformer结构通道交叉注意力双交叉注意力辅助损失

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双交叉注意力Transformer网络的小样本图像语义分割

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双交叉注意力Transformer网络的小样本图像语义分割

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索