基于光流信息融合的无监督视频目标分割网络  

Unsupervised video object segmentation networks based on optical flow information fusion

在线阅读下载全文

作  者:文彪 张惊雷 WEN Biao;ZHANG Jinglei(School of Electrical Engineering and Automation,Tianjin University of Technology,Tianjin 300384,China;Tianjin Key Laboratory of Complex System Control Theory and Application,Tianjin University of Technology,Tianjin 300384,China)

机构地区:[1]天津理工大学电气工程与自动化学院,天津300384 [2]天津理工大学天津市复杂系统控制理论与应用重点实验室,天津300384

出  处:《天津理工大学学报》2024年第6期94-101,共8页Journal of Tianjin University of Technology

基  金:天津市研究生科研创新项目(2021YJSO2S27)。

摘  要:随着机器学习特别是深度学习理论和算法的不断发展和视频数据的大量积累,采用无标签视频信息的无监督学习算法取得了长足进步。提出了一种融合光流信息的双流无监督学习视频目标分割网络。首先,将视频序列中的随机帧和与之对应的由光流网络生成的光流图分别输入到残差神经(residual networks,ResNet)主干网络,形成帧特征图和对应的帧间光流特征图。其次,为克服共同移动的背景信息对分割精度的影响,设计了目标位置信息融合模块(position information fusion,PIF),将输入视频帧和光流进行位置信息融合,在得到主要目标位置的同时,降低了背景噪声信号对分割的影响。最后,设计空间通道上下文信息融合注意力机制模块(spatial channel context information fusion,SCCF),将帧特征和光流特征的上下文信息与经典的空间通道注意力机制进行了融合。在DAVIS-16数据集上的实验表明,文中网络的平均区域相似性指标可达89.6,平均边界精度指标可达87.0,两项指标均达到该领域的最高水平。With the continuous development of machine learning,especially deep learning theories and algorithms,and the massive accumulation of video data,the unsupervised learning algorithms using unlabeled video information have made great progress.A dual flow unsupervised learning video object segmentation network that fuses optical flow information is proposed.First,the random frames in the video sequence and the corresponding optical flow map generated by the optical flow network are respectively input into the Residual Networks(ResNet)backbone network to extract the frame feature map and the corresponding inter-frame optical flow feature map.Secondly,in order to overcome the influence of the common moving background information on segmentation accuracy,a target position information fusion module(PIF) is designed to fuse the position information of the input video frame and optical flow,and at the same time reduce the impact of the background noise signal on segmentation while obtaining the main target position.Finally,a spatial channel context information fusion attention mechanism module(SCCF) is designed,which fuses the contextual information of frame features and optical flow features with the classical spatial channel attention mechanism.Experiments on the DAVIS-16 datasets show that the average regional similarity index of the proposed network can reach 89.6,and the average boundary accuracy index can reach 87.0,both of the state of the art in the field.

关 键 词:无监督学习 光流 视频目标分割 目标位置信息交互模块 空间通道上下文信息融合 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象