检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:党吉圣 郑慧诚[1,3,4] 王笔美 李俊成 丁恒辉 赖剑煌 Jisheng DANG;Huicheng ZHENG;Bimei WANG;Juncheng LI;Henghui DING;Jianhuang LAI(School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China;School of Computing,National University of Singapore,Singapore 119391,Singapore;Key Laboratory of Machine Intelligence and Advanced Computing,Ministry of Education,Guangzhou 510006,China;Guangdong Province Key Laboratory of Information Security Technology,Guangzhou 510006,China;College of Information Science and Technology,Jinan University,Guangzhou 510632,China;Institute of Big Data,Fudan University,Shanghai 200433,China)
机构地区:[1]中山大学计算机学院,广州510006 [2]School of Computing,National University of Singapore,Singapore 119391,Singapore [3]机器智能与先进计算教育部重点实验室,广州510006 [4]广东省信息安全技术重点实验室,广州510006 [5]暨南大学信息科学技术学院,广州510632 [6]复旦大学大数据研究院,上海200433
出 处:《中国科学:信息科学》2025年第1期80-93,共14页Scientia Sinica(Informationis)
基 金:国家自然科学基金(批准号:61976231,61972435,U20A20185);广东省基础与应用基础研究基金(批准号:2023A1515012853,2022B1515020103,2019A1515011869);深圳市科技计划(批准号:RCYX20200714114641140)资助项目。
摘 要:视频目标分割旨在自动分割视频中感兴趣的目标,在视频编辑、机器人导航以及自动驾驶等领域均有着广泛的应用前景.现有的视频目标分割方法大多依赖于独立帧表观记忆,这在处理严重遮挡或表观相似的复杂视频场景时常显不足.为应对这些挑战,本文提出了一种基于逐帧和逐段时空交互记忆网络(frame-wise and segment-wise spatio-temporal interaction memory,FSSTIM)的视频目标分割方法.FSSTIM引入逐帧和逐段时空交互记忆构建模块,通过构建时空上下文图网络提取逐段时空记忆特征图,并与逐帧记忆特征图进行交互增强,显著提高了网络处理相似表观和目标遮挡的能力.此外,引入动态采样记忆读取器实现了高效的多粒度历史信息读取,加快了推理速度并提高了分割精度.在DAVIS,YouTube-VOS和MOSE主流视频目标分割数据集上的实验表明,本文方法在保持实时处理速度的同时取得了先进的分割性能,且具有较强的泛化能力.Video object segmentation aims to automatically segment objects of interest in videos,with wide applications in areas such as video editing,robot navigation,and autonomous driving.Existing methods for video object segmentation mostly rely on independent-frame appearance memory,which often falls short when dealing with complex video scenes with severe occlusions or appearance similarities.To address these challenges,this paper proposes a VOS method based on frame-wise and segment-wise spatio-temporal interaction memory(FSSTIM).FSSTIM introduces frame-wise and segment-wise spatio-temporal interaction memory construction blocks,which extract segment-wise spatio-temporal memory feature maps by constructing spatio-temporal context graph networks and enhance them by interacting with frame-wise memory feature maps,significantly improving the network’s ability to handle similar appearances and object occlusions.Furthermore,the introduction of dynamic sampling memory readers achieves efficient multi-granularity historical information retrieval,speeding up inference and improving segmentation accuracy.Experiments on popular VOS datasets such as DAVIS,YouTube-VOS,and MOSE demonstrate that the proposed method achieves state-of-the-art performance while maintaining real-time processing speed and strong generalization capability.
关 键 词:视频目标分割 逐帧和逐段时空交互 记忆网络 时空上下文关联网络 动态采样记忆读取
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.61.205