基于离散余弦变换特征融合的无监督视频目标分割

Unsupervised Video Object Segmentation Based on Discrete Cosine Transform Feature Fusion

作　　者：王玉琛樊佳庆宋慧慧 WANG Yuchen;FAN Jiaqing;SONG Huihui(School of Automation,Nanjing University of Information Science&Technology,Nanjing,Jiangsu 210044;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106)

机构地区：[1]南京信息工程大学自动化学院,南京210044 [2]南京航空航天大学计算机科学与技术学院,南京211106

出　　处：《计算机与数字工程》2025年第2期395-402,共8页Computer & Digital Engineering

基　　金：国家自然科学基金项目(编号:61532009);江苏省自然科学基金项目(编号:BK20191397);江苏省研究生实践创新计划(编号:sjcx22_0355)资助。

摘　　要：无监督视频目标分割任务旨在对没有人工提供第一帧的目标分割真值掩膜的情况下,对视频中的前景对象进行定位和分割。现有的方法主要关注提高分割精度上,而忽略了内存和计算成本。通常,现有的方法只在空间域内根据重要性对特征进行增强,忽略了特征在频域中的差异性。此外,现有方法也没有充分利用全局语义信息来引导视频目标的分割。为解决上述问题,论文提出一种基于离散余弦变换特征融合的轻量级无监督视频目标分割网络。首先,使用轻量的骨干网络同时提取外观与运动特征;接着,设计了离散余弦变换特征融合模块,用于对外观与运动特征的融合与增强;然后,利用大核卷积全局语义引导模块对大核卷积分解,在降低计算量的同时,保持提取全局语义信息的能力;最后,在全局语义信息的引导下逐级聚合频域增强后的多级特征,最终得到精确的分割结果。通过上述设计,论文方法最终只有14.7 M参数量。论文在DAVIS2016、FBMS和DAVSOD数据集上进行了大量的实验评测,实验结果充分表明,论文方法在J&F、MAE和Fm等多个指标上均取得了良好的性能;同时,保持了高效的推理速度。Unsupervised Video Object Segmentation(UVOS)aims to localize and segment foreground objects in videos with⁃out manually providing the ground-truth object segmentation mask for the first frame.Existing methods mainly focus on improving segmentation accuracy while ignoring memory and computational cost.Generally,the existing methods only enhance the fusion fea⁃tures of appearance and motion in the spatial domain according to their significance,ignoring the particularity of features in the fre⁃quency domain.In addition,the existing methods do not make full use of global semantic information to guide video object segmenta⁃tion.To solve the problems above,this paper proposes a lightweight UVOS network based on discrete cosine transform feature fu⁃sion.Firstly,a lightweight backbone network is used to extract appearance and motion features simultaneously.Secondly,the dis⁃crete cosine transform feature fusion module is designed to fuse and enhance the appearance and motion features.Then,the large kernel convolution global semantic guidance module is used to integrate the large kernel volume,which can reduce the computation⁃al complexity of large kernel convolution and keep the ability of extracting global semantic information.Finally,under the guidance of global semantic information,the multi-level features enhanced in frequency domain are aggregated progressively,and finally the accurate segmentation results are obtained.Through the aforementioned designs,the presented method has only 14.7 M parameters.A large number of experimental evaluations are conducted on DAVIS2016,FBMS and DAVSOD datasets,showing that the method achieves favorable performance on J&F,MAE and Fm also keeps high reasoning speed.

关键词：无监督视频目标分割离散余弦变换注意力机制频域分析

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于离散余弦变换特征融合的无监督视频目标分割

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于离散余弦变换特征融合的无监督视频目标分割

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索