一种自监督掩码图像建模的遮挡目标检测方法

An occlusion object detection method based on self-supervised mask image modeling

作　　者：冯欣[1] 胡成杭 FENG Xin;HU Chenghang(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)

机构地区：[1]重庆理工大学计算机科学与工程学院,重庆400054

出　　处：《重庆理工大学学报（自然科学）》2024年第6期186-193,共8页Journal of Chongqing University of Technology：Natural Science

基　　金：重庆市研究生科研创新项目(CYS23678);重庆理工大学研究生教育高质量发展项目(gzlcx20233194)。

摘　　要：为提升目标检测网络在更多遮挡场景下的适应性和检测效果,提出了一种自监督掩码图像建模方法,该方法将训练分为2个阶段:预训练阶段和微调阶段。在预训练阶段,采用局部掩码和重建的代理任务对无标签图像进行训练。在微调阶段,针对被遮挡目标尺度变化和不同大小目标的检测问题,提出了基于视觉Transformer(vision transformer,ViT)的金字塔结构。通过在CrowdHuman和CityPersons数据集上进行对比分析,自监督掩码图像建模方法在检测被遮挡目标方面优于其他方法。As a fundamental pursuit within computer vision,object detection addresses the challenge of categorizing objects and accurately pinpointing their locations.Nevertheless,the intricacies of real-world scenarios frequently give rise to instances where objects are either partially or entirely obscured,introducing substantial complications for detection models.To bolster the versatility and detection proficiency of object detection networks when confronted with a multitude of occlusion scenarios,this paper introduces an innovative self-supervised approach to image modeling.The new approach is structured into two principal stages:pre-training and fine-tuning.During the pre-training phase,a surrogate task that entails the deliberate use of localized masking is employed,followed by the reconstruction of unlabeled images.This deliberate proxy task equips our model with valuable pre-training experiences,enabling it to acclimate to a spectrum of occlusion patterns and degrees.In the subsequent fine-tuning stage,the intrinsic challenges associated with detecting objects of varying scales and diverse sizes within occluded environments are addressed.A pyramid structure is proposed based on the Visual Transformer(ViT),a state-of-the-art architectural paradigm within computer vision.The ViT-FPN(Vision Transformer Feature Pyramid Network)substantially augments our detector’s proficiency in effectively managing a diverse range of occlusion scenarios.The method’s performance undergoes rigorous evaluation on benchmark datasets,including CrowdHuman and CityPersons.Our experimental results demonstrates the self-supervised masked image modeling approach presented in this study outperforms other methods in detecting occluded objects.

关键词：目标检测自监督局部掩码图像建模视觉Transformer

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种自监督掩码图像建模的遮挡目标检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种自监督掩码图像建模的遮挡目标检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索