检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯欣[1] 胡成杭 FENG Xin;HU Chenghang(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
机构地区:[1]重庆理工大学计算机科学与工程学院,重庆400054
出 处:《重庆理工大学学报(自然科学)》2024年第6期186-193,共8页Journal of Chongqing University of Technology:Natural Science
基 金:重庆市研究生科研创新项目(CYS23678);重庆理工大学研究生教育高质量发展项目(gzlcx20233194)。
摘 要:为提升目标检测网络在更多遮挡场景下的适应性和检测效果,提出了一种自监督掩码图像建模方法,该方法将训练分为2个阶段:预训练阶段和微调阶段。在预训练阶段,采用局部掩码和重建的代理任务对无标签图像进行训练。在微调阶段,针对被遮挡目标尺度变化和不同大小目标的检测问题,提出了基于视觉Transformer(vision transformer,ViT)的金字塔结构。通过在CrowdHuman和CityPersons数据集上进行对比分析,自监督掩码图像建模方法在检测被遮挡目标方面优于其他方法。As a fundamental pursuit within computer vision,object detection addresses the challenge of categorizing objects and accurately pinpointing their locations.Nevertheless,the intricacies of real-world scenarios frequently give rise to instances where objects are either partially or entirely obscured,introducing substantial complications for detection models.To bolster the versatility and detection proficiency of object detection networks when confronted with a multitude of occlusion scenarios,this paper introduces an innovative self-supervised approach to image modeling.The new approach is structured into two principal stages:pre-training and fine-tuning.During the pre-training phase,a surrogate task that entails the deliberate use of localized masking is employed,followed by the reconstruction of unlabeled images.This deliberate proxy task equips our model with valuable pre-training experiences,enabling it to acclimate to a spectrum of occlusion patterns and degrees.In the subsequent fine-tuning stage,the intrinsic challenges associated with detecting objects of varying scales and diverse sizes within occluded environments are addressed.A pyramid structure is proposed based on the Visual Transformer(ViT),a state-of-the-art architectural paradigm within computer vision.The ViT-FPN(Vision Transformer Feature Pyramid Network)substantially augments our detector’s proficiency in effectively managing a diverse range of occlusion scenarios.The method’s performance undergoes rigorous evaluation on benchmark datasets,including CrowdHuman and CityPersons.Our experimental results demonstrates the self-supervised masked image modeling approach presented in this study outperforms other methods in detecting occluded objects.
关 键 词:目标检测 自监督 局部掩码图像建模 视觉Transformer
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.27.94