机构地区:[1]北方民族大学计算机科学与工程学院,银川750021
出 处:《中国图象图形学报》2021年第2期391-401,共11页Journal of Image and Graphics
基 金:宁夏高等学校一流学科建设资助项目(电子科学与技术学科)(NXYLXK2017A07);宁夏回族自治区重点研发计划项目(2018BEB04002);大学生创新创业项目(2019-11407-017)。
摘 要:目的为了解决经典卷积神经网络无法满足图像中极小目标特征提取的准确性需求问题,本文基于Deep Labv3plus算法,在下采样过程中引入特征图切分模块,提出了Deep Labv3plus-IRCNet(IR为倒置残差(inverted residual,C为特征图切分(feature map cut))图像语义分割方法,支撑图像极小目标的特征提取。方法采用由普通卷积层和多个使用深度可分离卷积的倒置残差模块串联组成的深度卷积神经网络提取特征,当特征图分辨率降低到输入图像的1/16时,引入特征图切分模块,将各个切分特征图分别放大,通过参数共享的方式提取特征。然后,将每个输出的特征图进行对应位置拼接,与解码阶段放大到相同尺寸的特征图进行融合,提高模型对小目标物体特征的提取能力。结果本文方法引入特征图切分模块,提高了模型对小目标物体的关注,充分考虑了图像上下文信息,对多个尺度下的各个中间层特征进行融合,提高了图像分割精度。为验证方法的有效性,使用Cam Vid(Cambridge-driving labeled video database)数据集对提出的方法进行验证,平均交并比(mean intersection over union,m Io U)相对于Deep Labv3plus模型有所提升。验证结果表明了本文方法的有效性。结论本文方法充分考虑了图像分割中小目标物体的关注度,提出的Deep Labv3plus-IRCNet模型提升了图像分割精度。Objective A huge amount of image data have been generated with the development of the Internet of things and artificial intelligence technology and their widespread application to various fields.Understanding image content quickly and accurately and automatically segmenting the target area of an image in accordance with the requirements of the application scene have become the focus of many researchers.In recent years,image semantic segmentation methods based on deep learning have been developed steadily.These methods have been widely used in automatic driving and robot engineering,and have become the primary research task in computer vision.Common convolutional neural networks(CNNs)can efficiently extract the features of an image.They typically operate directly on the entire feature map.However,extremely small targets frequently occur in a local area of an image.The common convolution operation cannot efficiently extract the features of small targets.To solve this problem,the feature image cut module is introduced into the down-sampling process.Method At present,the spatial pyramid pool module and codec structure of a deep CNN(DCNN)have become the mainstream method for image semantic segmentation.The former network can extract the features of an input feature map by using filters or pooling operations with multiple rates and effective fields,and thus,encode the multi-scale context information.Meanwhile,the latter network can capture clearer object boundaries by gradually recovering spatial information.However,many difficulties and challenges persist.The first problem is that the DCNN model has extremely high requirements for the hardware platform and is unsuitable for real-time engineering applications.The second problem is that the resolution of the feature image shrinks after the image is encoded,resulting in the loss of the spatial information of some pixels.The third problem is that the segmentation process cannot effectively consider the image context information(i.e.,the relationship among pixels)and canno
关 键 词:空洞卷积 深度可分离卷积 特征图切分 特征提取网络 特征融合
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...