检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:秦思怡 盖绍彦[1,2] 达飞鹏[1,2] QIN Siyi;GAI Shaoyan;DA Feipeng(School of Automation,Southeast University,Nanjing 210096,China;Key Laboratory of Measurement and Control of Complex Engineering Systems,Ministry of Education,Southeast University,Nanjing 210096,China)
机构地区:[1]东南大学自动化学院,江苏南京210096 [2]东南大学复杂工程系统测量与控制教育部重点实验室,江苏南京210096
出 处:《浙江大学学报(工学版)》2024年第1期10-19,共10页Journal of Zhejiang University:Engineering Science
基 金:江苏省前沿引领技术基础研究专项项目(BK20192004C);江苏省高校优势学科建设工程资助项目。
摘 要:针对现有基于深度学习的视频目标检测算法无法同时满足精度和效率要求的问题,在单阶段检测器YOLOX-S的基础上,提出基于混合加权采样和多级特征聚合注意力的视频目标检测算法.混合加权参考帧采样(MWRS)策略采用加权随机采样操作和局部连续采样操作,充分利用有效的全局信息与帧间局部信息.多级特征聚合注意力模块(MFAA)基于自注意力机制,对YOLOX-S提取的分类特征进行细化,使得网络从不同层次的特征中学到更加丰富的特征信息.实验结果表明,所提算法在ImageNet VID数据集上的检测精度均值AP50达到77.8%,平均检测速度为11.5 ms/帧,在检测图片上的目标分类和定位效果明显优于YOLOX-S,表明所提算法达到了较高的精度,具有较快的检测速度.A video object detection algorithm which was built upon the YOLOX-S single-stage detector based on mixed weighted reference-frame sampler and multi-level feature aggregation attention was proposed aiming at the problems of existing deep learning-based video object detection algorithms failing to simultaneously meet accuracy and efficiency requirements.Mixed weighted reference-frame sampler(MWRS)included weighted random sampling and local consecutive sampling to fully utilize effective global information and inter-frame local information.Multilevel feature aggregation attention(MFAA)module refined the classification features extracted by YOLOX-S based on self-attention mechanism,encouraging the network to learn richer feature information from multi-level features.The experimental results demonstrated that the proposed algorithm achieved an average precision AP50 of 77.8%on the ImageNet VID dataset with an average detection speed of 11.5 milliseconds per frame.The object classification and location performance are significantly better than that of YOLOX-S,indicating that the proposed algorithm achieves higher accuracy and faster detection speed.
关 键 词:机器视觉 视频目标检测 特征聚合 注意力机制 YOLOX
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7