掩码生成动态调控弱监督视频实例分割  

Mask generation dynamically regulates weakly supervised video instance segmentation

在线阅读下载全文

作  者:何自芬[1] 徐林 张印辉[1] 黄滢 HE Zifen;XU Lin;ZHANG Yinhui;HUANG Ying(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学机电工程学院,云南昆明650000

出  处:《光学精密工程》2023年第19期2884-2897,共14页Optics and Precision Engineering

基  金:国家自然科学基金资助项目(No.62171206,No.62061022)。

摘  要:针对全监督视频实例分割网络训练数据高度依赖精细掩码标注,时间和人工成本过高,导致智能机器无法快速适应新场景的问题,提出一种端到端的掩码生成动态调控弱监督视频实例分割(Weakly Supervised Video Instance Segmentation,WSVIS)网络。为克服初始掩码预测层通道维度突降导致的实例激活特征丢失问题,构建多级特征融合模块,利用特征复用策略预测初始实例特征并融合相对位置信息生成初始预测掩码。然后,提出动态调控机制在通道和空间维度上建立掩码特征依赖关系,强化初始预测掩码与实例感知信息之间的动态交互。最后,网络设计二元颜色相似性生成伪亲和标签取代精细掩码标注,联合边界框与掩码一致性损失实现仅边界框标注的弱监督视频实例分割。实验结果表明,在BoxSet和YT-VIS数据集上,WSVIS网络能达到与全监督网络相近的分割精度和分割效果,同时能够满足实时推理要求,为智能机器快速适应新场景实现实时环境感知和理解提供了理论支撑和算法依据。The training data of fully supervised video instance segmentation networks are highly dependent on accurate mask annotations under high labor and time costs,owing to which intelligent machines are unable to quickly adapt to new scenes.Therefore,a mask generation,dynamically regulated weakly supervised video instance segmentation(WSVIS)network was proposed.First,to overcome the loss of instance activation features caused by the sudden dimension drop of the initial mask prediction layer channel,a multi-level feature fusion module was used to predict the initial instance features through a step-by-step feature reuse strategy and to generate the initial mask by fusing the relative position information.Second,a dynamic regulation mechanism was introduced to establish mask feature dependencies in the channel and spatial dimensions to strengthen the dynamic interaction between the initial predicted mask and instanceaware information.Finally,the network replaces fine mask labeling with the binary color similarity of images,and the bounding box consistency loss and supervised video instance segmentation mask were replaced with bounding box labeling only.Experimental results reveal that on the BoxSet and YT-VIS datasets,the WSVIS network achieves similar segmentation accuracy and segmentation effect as the fully supervised network and can satisfy real-time reasoning,providing theoretical support and an algorithmic basis for intelligent machines to quickly adapt to new scenes to realize real-time environmental perception and understanding.

关 键 词:智能机器 弱监督视频实例分割 多级特征融合 动态调控 二元颜色相似性 

分 类 号:TP394.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象