基于并行多方向注意力的无监督视频目标分割被引量：1

Unsupervised Video Object Segmentation via Parallel Multiple Direction Attention

作　　者：樊佳庆苏天康张开华刘青山 FAN Jia-Qing;SU Tian-Kang;ZHANG Kai-Hua;LIU Qing-Shan(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106;School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044;School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044)

机构地区：[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]南京信息工程大学自动化学院,南京210044 [3]南京信息工程大学计算机学院,南京210044

出　　处：《计算机学报》2022年第11期2337-2347,共11页Chinese Journal of Computers

基　　金：科技创新2030-“新一代人工智能”重大项目(2018AAA0100400);国家自然科学基金项目(U21B2044,61825601,61876088);江苏省333工程人才项目(BRA2020291)资助.

摘　　要：时空特征传播对准确的无监督视频目标分割任务至关重要.但是,由于现实中视频的复杂性,导致时空特征学习与传播变得十分具有挑战性.在本文中,提出了两个新颖的模块分别用于增强视频中目标的空间和时间表示.具体来说,首先,针对当前帧,在空间上提出一个新颖的多方向注意力模块,旨在沿着水平、垂直与通道方向上分别提取注意力图.同时,设计了一个并行时序模块用于整合当前帧和之前帧的信息.该模块并行地计算出连续帧之间的二阶相似度,并且根据该相似度图重新对当前帧特征进行加权与增强.此外,该相似度图还直接生成一个有效的掩膜,用于进一步增广当前帧中目标的特征表示.接着,将上述空间和时间特征进行融合以获得最终增广的时空特征表示,并将其输入解码器来预测当前帧中待分割目标的掩膜.在三个主流无监督视频目标分割数据集上的大量实验结果表明,本文提出的方法与当前最新方法相比取得了领先的性能.相关代码将公布在https://github.com/su1517007879/MP-VOS.The propagations of spatio-temporal representations are essential for accurate unsupervised video object segmentation.However,due to the complexity of realistic videos,the learning and propagation of spatio-temporal representations become very challenging.In this paper,both spatial and temporal representations of objects are enhanced by two modules,respectively.Specifically,in the current frame,this paper spatially proposes a novel multiple direction attention module,aiming to extract triple attention maps along the horizontal,vertical,and channel-level directions.Simultaneously,a parallel temporal module is designed to integrate the information from the previous video frame with current representations.It calculates the second-order similarity of the coherent frame pairs in a parallel way,and re-weights the current frame according to the similarity map.Also,the similarity map directly generates a valid mask to augment the representation of current frame.Furthermore,the spatial and temporal features are fused to achieve the augmented spatio-temporal representations and put into the decoder framework to predict the masks of current frame.Extensive experimental results on three mainstream unsupervised VOS datasets show that the proposed model yields favorable performance.The source code is available at https://github.com/su1517007879/MP-VOS.

关键词：无监督视频目标分割多方向注意力时空调制并行注意力

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于并行多方向注意力的无监督视频目标分割被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于并行多方向注意力的无监督视频目标分割 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于并行多方向注意力的无监督视频目标分割被引量：1