融合改进ASPP和极化自注意力的自底向上全景分割  被引量:2

The improved atrous spatial pyramid pooling and polarized self-attention based bottom-up panoptic segmentation

在线阅读下载全文

作  者:李新叶[1,2] 陈丁 Li Xinye;Chen Ding(Department of Electronic and Communication Engineering,North China Electric Power University,Baoding 071003,China;Hebei Key Laboratory of Power Internet of Things Technology,North China Electric Power University,Baoding 071003,China)

机构地区:[1]华北电力大学电子与通信工程系,保定071003 [2]华北电力大学河北省电力物联网技术重点实验室,保定071003

出  处:《中国图象图形学报》2023年第8期2410-2419,共10页Journal of Image and Graphics

基  金:中央高校基本科研业务费专项资金资助(2020YJ006);河北省省级科技计划资助项目(SZX2020034)。

摘  要:目的针对ASPP(atrous spatial pyramid pooling)在空洞率变大时空洞(atrous)卷积效果会变差的情况,以及图像分类经典模型ResNet(residual neural network)并不能有效地适用于细粒度图像分割任务的问题,提出一种基于改进ASPP和极化自注意力的自底向上全景分割方法。方法重新设计ASPP模块,将小空洞率卷积的输出与原始输入进行拼接(concat),将得到的结果作为新的输入传递给大空洞率卷积,然后将不同空洞率卷积的输出结果拼接,并将得到的结果与ASPP中的其他模块进行最后拼接,从而改善ASPP中因空洞率变大导致的空洞卷积效果变差的问题,达到既获得足够感受野的同时又能编码多尺度信息的目的;在主干网络的输出后引入改进的极化自注意力模块,实现对图像像素级的自我注意强化,使其得到的特征能直接适用于细粒度像素分割任务。结果本文在Cityscapes数据集的验证集上进行测试,与复现的基线网络Panoptic-DeepLab(58.26%)相比,改进ASPP模块后分割精度PQ(panoptic quality)(58.61%)提高了0.35%,运行时间从103 ms增加到124 ms,运行速度没有明显变化;通过进一步引入极化自注意力,PQ指标(58.86%)提高了0.25%,运行时间增加到187 ms;通过对该注意力模块进一步改进,PQ指标(59.36%)在58.86%基础上又提高了0.50%,运行时间增加到192 ms,速度略有下降,但实时性仍好于大多数方法。结论本文采用改进ASPP和极化自注意力模块,能够更有效地提取适合细粒度像素分割的特征,且在保证足够感受野的同时能编码多尺度信息,从而提升全景分割性能。Objective Panoptic segmentation can be as a challenging task in computer vision and image segmentation nowadays.It is focused on all objects-related segmentation in an image relevant to such categories of foreground“thing”and background“stuff”.Panoptic segmentation can optimize semantic segmentation and instance segmentation to a certain extent in relevance to such domain of vision applications like autonomous driving,simultaneous localization and mapping(SLAM),multi-object tracking and segmentation(MOTS).Most of panoptic segmentation methods can be used to follow the top-down path and the principle of detection before segmentation.Such method is based on instance segmentation or object detection,and a semantic branch is added to rich semantic segmentation.The segmentation performance of these models is feasible,but it needs a complex post-processing stage to deal with branches-between and within conflicts,which can make the inference be slower.Another category of these methods can follow the idea of bottom-up,for which semantic segmentation can be regarded as the basis,and the image can be recognized as a whole at the pixel level.It can be used to optimize tedious post-processing.Recently,a bottom-up panoptic segmentation(Panoptic-DeepLab)is used to divide the panoptic segmentation task into two branches.Each branch has a specific decoder network and segmentation head network.The semantic segmentation head outputs the semantic segmentation results.The same structure-related two instance heads can be used to predict the center instance and offset simutaneously.It can get better segmentation accuracy and speed.However,the atrous spatial pyramid pooling(ASPP)module is still used in the decoder network to increase the receptive field.For ASPP,to obtain a large enough receptive field,it needs sufficient dilation rate.When the dilation rate is larger,the effect of atrous convolution is worse.On the other hand,residual neural network(ResNet)is used as a shared encoder,which may be sub-optimal for fine-grained i

关 键 词:全景分割 语义分割 实例分割 极化自注意力 ASPP 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象