基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法  被引量:1

Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation

在线阅读下载全文

作  者:侯志强 董佳乐 马素刚 王晨旭[1,2] 杨小宝 王昀琛 HOU Zhiqiang;DONG Jiale;MA Sugang;WANG Chenxu;YANG Xiaobao;WANG Yunchen(Institute of Computer,Xi’an University of Posts and Telecommunications,Xi’an 710121,China;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing,Xi’an University of Posts and Telecommunications,Xi’an 710121,China)

机构地区:[1]西安邮电大学计算机学院,西安710121 [2]西安邮电大学陕西省网络数据分析与智能处理实验室,西安710121

出  处:《电子与信息学报》2024年第11期4198-4207,共10页Journal of Electronics & Information Technology

基  金:国家自然科学基金(62072370);陕西省自然科学基金(2023-JC-YB-598)。

摘  要:针对记忆网络算法中多尺度特征表达能力不足和浅层特征没有充分利用的问题,该文提出一种多尺度特征增强与全局-局部特征聚合的视频目标分割(VOS)算法。首先,通过多尺度特征增强模块融合可参考掩码分支和可参考RGB分支的不同尺度特征信息,增强多尺度特征的表达能力;同时,建立了全局-局部特征聚合模块,利用不同大小感受野的卷积操作来提取特征,并通过特征聚合模块来自适应地融合全局区域和局部区域的特征,这种融合方式可以更好地捕捉目标的全局特征和细节信息,提高分割的准确性;最后,设计了跨层融合模块,利用浅层特征的空间细节信息来提升分割掩码的精度,通过将浅层特征与深层特征融合,能更好地捕捉目标的细节和边缘信息。实验结果表明,在公开数据集DAVIS2016,DAVIS2017和YouTube-2018上,该文算法的综合性能分别达到91.8%、84.5%和83.0%,在单目标和多目标分割任务上都能实时运行。To address the issues of insufficient multi-scale feature expression ability and insufficient utilization of shallow features in memory network algorithms,a Video Object Segmentation(VOS)algorithm based on multi-scale feature enhancement and global local feature aggregation is proposed in this paper.Firstly,the multi-scale feature enhancement module fuses different scale feature information from reference mask branches and reference RGB branches to enhance the expression ability of multi-scale features;At the same time,a global local feature aggregation module is established,which utilizes convolution operations of different sizes of receptive fields to extract features,through the feature aggregation module,the features of the global and local regions are adaptively fused.This fusion method can better capture the global features and detailed information of the target,improving the accuracy of segmentation;Finally,a cross layer fusion module is designed to improve the accuracy of masks segmentation by utilizing the spatial details of shallow features.By fusing shallow features with deep features,it can better capture the details and edge information of the target.The experimental results show that on the public datasets DAVIS2016,DAVIS2017,and YouTube 2018,the comprehensive performance of our algorithm reaches 91.8%,84.5%,and 83.0%,respectively,and can run in real-time on both single and multi-objective segmentation tasks.

关 键 词:视频目标分割 记忆网络 孪生网络 特征融合 掩码细化 

分 类 号:TN911.73[电子电信—通信与信息系统] TP391.41[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象