检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:巨志勇[1] 李玉明 薛永杰 叶雨新 赖颖 JU Zhiyong;LI Yuming;XUE Yongjie;YE Yuxin;LAI Ying(School of Optical Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200082,China)
机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200082
出 处:《控制工程》2023年第10期1912-1926,共15页Control Engineering of China
基 金:国家自然科学基金资助项目(81101116)。
摘 要:为提高行人检测算法在实际应用中的准确率,提出在YOLOv4模型中融合Vision Transformer模型与深度可分离卷积的vit-YOLOv4模型。该模型将Vision Transformer模型加入YOLOv4模型的主干特征提取网络与空间金字塔池化层中,充分发挥该模型的多头注意力机制对图像特征进行预处理的能力;同时,用深度可分离卷积替换路径聚合网络中堆叠后的常规卷积,以便模型在后续的特征提取中能够提取出更多有用的特征。实验结果表明,vit-YOLOv4模型提高了行人检测的准确率,降低了漏检率,综合性能较优。In order to improve the accuracy of the pedestrian detection algorithm in practical application,the vit-YOLOv4 model is proposed by combining the Vision Transformer model and deep separable convolution in the YOLOv4 model.The Vision Transformer model is added into the backbone feature extraction network of the YOLOv4 model and spatial pyramid pooling(SPP)layer,which gives full play to the multi-head attention mechanism of the YOLOv4 model to preprocess image features.At the same time,the stacked conventional convolutions in the path aggregation network(PANet)are replaced by deep separable convolutions,so that the model can extract more useful features in the subsequent feature extraction.The experimental results show that the vit-YOLOv4 model can improve the accuracy of pedestrian detection and reduce the missed detection rate,and has excellent comprehensive performers.
关 键 词:行人检测 YOLOv4 Vision Transformer 深度可分离卷积 多头注意力机制
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.186