基于时空感知级联神经网络的视频前背景分离  被引量:1

Fusing Spatiotemporal Clues with Cascading Neural Networks for Foreground-Background Separation

在线阅读下载全文

作  者:杨敬钰 师雯 李坤[2] 宋晓林 岳焕景 Yang Jingyu;Shi Wen;Li Kun;Song Xiaolin;Yue Huanjing(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;School of Computer Science and Technology,Tianjin University,Tianjin 300350,China)

机构地区:[1]天津大学电气自动化与信息工程学院,天津 300072 [2]天津大学计算机科学与技术学院,天津 300350

出  处:《天津大学学报(自然科学与工程技术版)》2020年第6期633-640,共8页Journal of Tianjin University:Science and Technology

基  金:国家自然科学基金资助项目(61571322,61771339,61672378);天津市科学技术计划资助项目(17ZXRGGX00160,18JCYBJC19200).

摘  要:针对在复杂情景下视频前背景分离技术中存在的前景泄露问题,设计开发了一个端对端的二级级联深度卷积神经网络,实现了对输入视频序列进行精确的前景和背景分离.所提网络由一级前景检测子网络和二级背景重建子网络串联而成.一级网络融合时间和空间信息,其输入包含2个部分:第1个部分是3张连续的彩色RGB视频帧,分别为上一帧、当前帧和下一帧;第2个部分是3张与彩色视频帧相对应的光流图.一级前景检测子网络通过结合2部分输入对视频序列中运动的前景进行精确检测,生成二值化的前景掩膜.该部分网络是一个编码器-解码器网络:编码器采用VGG16的前5个卷积块,用来提取两部分输入的特征图,并在经过每一个卷积层后对两类特征图进行特征融合;解码器由5个反卷积模块构成,通过学习特征空间到图像空间的映射,从而生成当前帧的二值化的前景掩膜.二级网络包含3个部分:编码器、传输层和解码器.二级网络能够利用当前帧和生成的前景掩膜对缺失的背景图像进行高质量的修复重建.实验结果表明,本文所提时空感知级联卷积神经网络在公共数据集上取得了较其他方法更好的结果,能够应对各种复杂场景,具有较强的通用性和泛化能力,且前景检测和背景重建结果显著超越多种现有方法.Separation of foreground and background in video clips presented various problems,such as foreground leakage.To solve these problems,this paper proposed an end-to-end cascading deep convolutional neural network,which can accurately separate foreground and background in video clips.The proposed method included foreground detection and background reconstruction sub-network,and they were cascaded.The first network fused time and space information,and its input consisted of two parts:the first part included three consecutive RGB video frames,the previous,current and next frames;the second part included three optical flow maps corresponding to RGB video frames.These two inputs were combined by the first sub-network in order to detect moving objects and generate a binary foreground mask.The foreground detection sub-network was a multi-input encoder-decoder network:the encoder was the first five convolution blocks of VGG16 to extract the feature maps of two inputs.These two types of feature maps were fused after each convolution layer.The decoder consisted of five transpose convolution layers that could generate a binary mask for the current frame through learning a projection from the feature space to the image space.The background reconstruction sub-network contained three parts:the encoder,the transmitter and the decoder,which took the generated mask and the current frame to reconstruct the background pixels occluded by the foreground.Experimental results showed that the proposed spatiotemporal fused cascade convolutional neural network has achieved better performance on the public dataset than other methods and can handle various complex scenarios.Foreground detection and background reconstruction results greatly outperformed the existing state-of-the-art methods.

关 键 词:背景重建 运动物体检测 卷积神经网络 光流 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象