视觉弱监督学习研究进展  被引量:12

Progress in weakly supervised learning for visual understanding

在线阅读下载全文

作  者:任冬伟[1] 王旗龙 魏云超[3] 孟德宇[4] 左旺孟[1] Ren Dongwei;Wang Qilong;Wei Yunchao;Meng Deyu;Zuo Wangmeng(Harbin Institute of Technology,Harbin 150001,China;Tianjin University,Tianjin 300350,China;Beijing Jiaotong University,Beijing 100091,China;Xi′an Jiaotong University,Xi′an 710049,China)

机构地区:[1]哈尔滨工业大学,哈尔滨150001 [2]天津大学,天津300350 [3]北京交通大学,北京100091 [4]西安交通大学,西安710049

出  处:《中国图象图形学报》2022年第6期1768-1798,共31页Journal of Image and Graphics

基  金:科技创新2030——“新一代人工智能”重大项目(2021ZD0112100);国家自然科学基金项目(62172127,U19A2073)。

摘  要:视觉理解,如物体检测、语义和实例分割以及动作识别等,在人机交互和自动驾驶等领域中有着广泛的应用并发挥着至关重要的作用。近年来,基于全监督学习的深度视觉理解网络取得了显著的性能提升。然而,物体检测、语义和实例分割以及视频动作识别等任务的数据标注往往需要耗费大量的人力和时间成本,已成为限制其广泛应用的一个关键因素。弱监督学习作为一种降低数据标注成本的有效方式,有望对缓解这一问题提供可行的解决方案,因而获得了较多的关注。围绕视觉弱监督学习,本文将以物体检测、语义和实例分割以及动作识别为例综述国内外研究进展,并对其发展方向和应用前景加以讨论分析。在简单回顾通用弱监督学习模型,如多示例学习(multiple instance learning,MIL)和期望—最大化(expectation-maximization,EM)算法的基础上,针对物体检测和定位,从多示例学习、类注意力图机制等方面分别进行总结,并重点回顾了自训练和监督形式转换等方法;针对语义分割任务,根据不同粒度的弱监督形式,如边界框标注、图像级类别标注、线标注或点标注等,对语义分割研究进展进行总结分析,并主要回顾了基于图像级别类别标注和边界框标注的弱监督实例分割方法;针对视频动作识别,从电影脚本、动作序列、视频级类别标签和单帧标签等弱监督形式,对弱监督视频动作识别的模型与算法进行回顾,并讨论了各种弱监督形式在实际应用中的可行性。在此基础上,进一步讨论视觉弱监督学习面临的挑战和发展趋势,旨在为相关研究提供参考。Visual understanding,e.g.,object detection,semantic/instance segmentation,and action recognition,plays a crucial role in many real-world applications including human-machine interaction,autonomous driving,etc.Recently,deep networks have made great progress in these tasks under the full supervision regime.Based on convolutional neural network(CNN),a series of representative deep models have been developed for these visual understanding tasks,e.g.,you only look once(YOLO)and Fast/Faster R-CNN(region CNN)for object detection,fully convolutional networks(FCN)and DeepLab for semantic segmentation,Mask R-CNN and you only look at coefficients(YOLACT)for instance segmentation.Recently,driven by novel network backbone,e.g.,Transformer,the performance of these tasks have been further boosted under full supervision regime.However,supervised learning relies on massive accurate annotations,which are usually laborious and costly.By taking semantic segmentation as an example,it is very laborious and costly for collecting dense annotations,i.e.,pixel-wise segmentation masks,while weak supervision annotations,e.g.,bounding box annotations,point annotations,are much easier to collect.Moreover,for video action recognition,the scenes in videos are very complicated,and it is very likely to be impossible to annotate all the actions with accurate time intervals.Alternatively,weakly supervised learning is effective in reducing the cost of data annotations,and thus is very important to the development and applications of visual understanding.Taking object detection,semantic/instance segmentation,and action recognition as examples,this article aims to provide a survey on recent progress in weakly supervised visual understanding,while pointing out several challenges and opportunities.To begin with,we first introduce two representative weakly supervised learning methods,including multiple instance learning(MIL)and expectation-maximization(EM)algorithms.Despite of different network architectures in recent weakly supervised learning methods,m

关 键 词:弱监督学习 目标定位 目标检测 语义分割 实例分割 动作识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象