基于不确定性感知自适应伪标签的指代视频目标分割  

Uncertainty-Aware Adaptive Pseudo-Labeling for Referring Video Object Segmentation

在线阅读下载全文

作  者:张施明 陈智谦 米金鹏 Shiming Zhang;Zhiqian Chen;Jinpeng Mi(Institute of Machine Intelligence,University of Shanghai for Science and Technology,Shanghai)

机构地区:[1]上海理工大学机器智能研究院,上海

出  处:《建模与仿真》2025年第2期236-244,共9页Modeling and Simulation

基  金:国家自然科学基金(62106026,62272170,42130112);上海市自然科学基金面上项目(23ZR1419300)。

摘  要:指代视频目标分割(Referring Video Object Segmentation,RVOS)是一项新兴的多模态任务,旨在通过理解给定指代表达的语义来分割视频片段中的目标区域。然而,基准数据集的标注是通过半监督方式收集的,仅提供了视频第一帧的真实目标掩码。为了在一个更综合的框架中探索未标记数据中的隐藏知识,本文引入了在线伪标签来解决RVOS问题。具体来说,使用之前训练阶段的即时学习检查点作为教师模型,在未标记的视频帧上生成伪标签,并将获得的伪标签用作训练数据的增强,以监督随后的训练阶段。为了避免伪标签带来的混淆,本文提出了一种不确定性感知的细化策略,根据模型预测的置信度自适应地修正生成的伪标签。本文在基准数据集Refer-YouTube-VOS和Refer-DAVIS17上进行了广泛的实验来验证所提出的方法。实验结果表明,本文的模型与最先进的模型相比取得了具有竞争力的结果。Referring video object segmentation(RVOS)is an emerging multimodal task aiming to segment target regions in video clips by understanding the semantics of given referring expressions.While the annotations of the benchmark datasets are collected in a semi-supervised manner,which only provides the ground truth object masks on the first frame of videos.To explore the concealed knowledge in the unlabeled data in a more integrated framework,we introduce online pseudo-labeling to address RVOS.Specifically,we employ the on-the-fly learned checkpoints in the previous training epochs as the teacher model to produce the pseudo labels on the unlabeled video frames,and the obtained pseudo-labels are utilized as augmentation for the training data to supervise the subsequent training stage.To avert the confusion derived from pseudo-labels,we propose an uncertainty-aware refinement strategy to adaptively rectify the generated pseudo-labels based on the model prediction confidence.We conduct extensive experiments on the benchmark datasets Refer-YouTube-VOS and Refer-DAVIS17 to validate the proposed approach.The experimental results demonstrate that our model achieves competitive results compared with state-of-the-art models.

关 键 词:指代视频目标分割 伪标签 不确定性感知细化 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象