多尺度条形池化与通道注意力的图像语义分割  被引量:6

Semantic image segmentation by using multi-scale strip pooling and channel attention

在线阅读下载全文

作  者:马吉权[1] 赵淑敏 孔凡辉 Ma Jiquan;Zhao Shumin;Kong Fanhui(School of Computer Science and Technology,Heilongjiang University,Harbin 150080,China;School of Data Science and Technology,Heilongjiang University,Harbin 150080,China)

机构地区:[1]黑龙江大学计算机科学与技术学院,哈尔滨150080 [2]黑龙江大学数据科学与技术学院,哈尔滨150080

出  处:《中国图象图形学报》2022年第12期3530-3541,共12页Journal of Image and Graphics

基  金:黑龙江省自然科学基金项目(LH2021F046)。

摘  要:目的针对自然场景下图像语义分割易受物体自身形状多样性、距离和光照等因素影响的问题,本文提出一种新的基于条形池化与通道注意力机制的双分支语义分割网络(strip pooling and channel attention net,SPCANet)。方法SPCANet从空间与内容两方面对图像特征进行抽取。首先,空间感知子网引入1维膨胀卷积与多尺度思想对条形池化技术进行优化改进,进一步在编码阶段增大水平与竖直方向上的感受野;其次,为了提升模型的内容感知能力,将在ImageNet数据集上预训练好的VGG16(Visual Geometry Group 16-layer network)作为内容感知子网,以辅助空间感知子网优化语义分割的嵌入特征,改善空间感知子网造成的图像细节信息缺失问题。此外,使用二阶通道注意力进一步优化网络中间层与高层的特征选择,并在一定程度上缓解光照产生的色差对分割结果的影响。结果使用Cityscapes作为实验数据,将本文方法与其他基于深度神经网络的分割方法进行对比,并从可视化效果和评测指标两方面进行分析。SPCANet在目标分割指标mIoU(mean intersection over union)上提升了1.2%。结论提出的双分支语义分割网络利用改进的条形池化技术、内容感知辅助网络和通道注意力机制对图像语义分割进行优化,对实验结果的提升起到了积极作用。Objective Real-scenario image semantic segmentation is likely to be affected by multiple object-context shapes,ranges and illuminations.Current semantic segmentation methods have inaccurate classification results for pedestrians,buildings,road signs and other objects due to their small scales or wide ranges.At the same time,the existing methods are not distinguishable for objects with chromatic aberration,and it is easy to divide the same chromatic aberration-derived object into different objects,or segment different objects with similar colors into the same type of objects.In order to improve the performance of semantic image segmentation,we facilitate a new dual-branch semantic segmentation network in terms of strip pooling and attention mechanism(strip pooling and channel attention net(SPCANet)).Method the SPCANet can be used to extract the features of images via spatial and content perceptions.First,we employ the spatial perception Sub-net to augment the receptive field in the horizontal and vertical directions on the down-sampling stage by using dilated convolution and strip pooling with multi-scale.Our specific approach is focused on adding four parallel one-dimensional dilated convolutions with different rates to the horizontal and vertical branches on the basis of strip pooling model(based on the pooling operation which kernel size is n×1 or 1×n),which enhance the perception of large-scale objects in the image.Nextly,in order to improve the content perception ability of the model,we use the pre-trained VGG16(Visual Geometry Group 16-layer network)based on ImageNet dataset as the content-perception sub-net to optimize the embedded features of semantic segmentation via spatial-perception assisted sub-net.The content sub-net can strengthen feature representation in combination with the spatial perception subnet.In addition,the second-order channel attention is used to optimize the feature assignment further between the middle and high-level layers of the network.In the network training period,the target in

关 键 词:图像分割 注意力 条形池化 膨胀卷积 感受野 

分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象