A Weakly Supervised Semantic Segmentation Method Based on Improved Conformer  

作  者:Xueli Shen Meng Wang 

机构地区:[1]School of Software,Liaoning Technical University,Huludao,125105,China

出  处:《Computers, Materials & Continua》2025年第3期4631-4647,共17页计算机、材料和连续体(英文)

摘  要:In the field of Weakly Supervised Semantic Segmentation(WSSS),methods based on image-level annotation face challenges in accurately capturing objects of varying sizes,lacking sensitivity to image details,and having high computational costs.To address these issues,we improve the dual-branch architecture of the Conformer as the fundamental network for generating class activation graphs,proposing a multi-scale efficient weakly-supervised semantic segmentation method based on the improved Conformer.In the Convolution Neural Network(CNN)branch,a cross-scale feature integration convolution module is designed,incorporating multi-receptive field convolution layers to enhance the model’s ability to capture long-range dependencies and improve sensitivity to multi-scale objects.In the Vision Transformer(ViT)branch,an efficient multi-head self-attention module is developed,reducing unnecessary computation through spatial compression and feature partitioning,thereby improving overall network efficiency.Finally,a multi-feature coupling module is introduced to complement the features generated by both branches.This design retains the strength of Convolution Neural Network in extracting local details while harnessing the strength of Vision Transformer to capture comprehensive global features.Experimental results show that the mean Intersection over Union of the image segmentation results of the proposed method on the validation and test sets of the PASCAL VOC 2012 datasets are improved by 2.9%and 3.6%,respectively,over the TransCAM algorithm.Besides,the improved model demonstrates a 1.3%increase of the mean Intersections over Union on the COCO 2014 datasets.Additionally,the number of parameters and the floating-point operations are reduced by 16.2%and 12.9%.However,the proposed method still has limitations of poor performance when dealing with complex scenarios.There is a need for further enhancing the performance of this method to address this issue.

关 键 词:WSSS CAM transformer CNN multi-scale feature extraction LIGHTWEIGHT 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象