基于改进YOLO-Pose的复杂环境下拖拉机驾驶员关键点检测  被引量:3

Detecting the key points of tractor drivers under complex environments using improved YOLO-Pose

在线阅读下载全文

作  者:徐红梅[1,2] 杨浩 李亚林 张文杰 赵亚兵 吴擎 XU Hongmei;YANG Hao;LI Yalin;ZHANG Wenjie;ZHAO Yabing;WU Qing(College of Engineering,Huazhong Agricultural University,Wuhan 430070,China;Key Laboratory of Agricultural Equipment in Mid-lower Reaches of the Yangtze River,Ministry of Agriculture and Rural Affairs,Wuhan 430070 China)

机构地区:[1]华中农业大学工学院,武汉430070 [2]农业农村部长江中下游农业装备重点实验室,武汉430070

出  处:《农业工程学报》2023年第16期139-149,共11页Transactions of the Chinese Society of Agricultural Engineering

基  金:国家自然科学基金面上项目(52175232)

摘  要:为解决农田复杂作业环境下拖拉机驾驶员因光照、背景及遮挡造成的关键点漏检、误检等难识别问题,该研究提出了一种基于改进YOLO-Pose的复杂环境下驾驶员关键点检测方法。首先,在主干网络的顶层C3模块中嵌入Swin Transformer编码器,提高遮挡状况下关键点的检测效率。其次,采用高效层聚合网络RepGFPN作为颈部网络,通过融合高层语义信息和低层空间信息,增强多尺度检测能力,同时在颈部网络采用金字塔卷积替换标准3×3卷积,在减少模型参数量的同时有效地捕获不同层级的特征信息。最后,嵌入坐标注意力机制优化关键点解耦头,增强预测过程对关键点空间位置的敏感程度。试验结果表明,改进后算法mAP0.5(目标关键点相似度Loks阈值取0.5时平均精度均值)为89.59%,mAP0.5:0.95(目标关键点相似度Loks阈值取0.5,0.55,…,0.95时的平均精度均值)为62.58%,相比于基线模型分别提高了4.24和4.15个百分点,单张图像平均检测时间为21.9 ms,与当前主流关键点检测网络Hourglass、HRNet-W32及DEKR相比,mAP0.5分别提升了7.94、5.27、2.66个百分点,模型大小分别减少了257.5、8.2、9.3 M。改进后的关键点检测算法具有较高的检测精度和推理速度,可为拖拉机驾驶员的异常行为识别和状态监测提供技术支持。Key point leakage and misdetection have posed a great challenge on the recognition of tractor driver,due to the light,background,and occlusion in the complex operating environment of farmland.In this study,a joint driver-key point detection was proposed using improved YOLO-Pose.Firstly,Swin Transformer encoder was introduced in the top layer C3 module of the backbone network CSPDarkNet53.Among them,the encoder window size was set as 8,and the number of self-attention computation heads was 16.Swin Transformer encoder was used the self-attention of shifted windows(SW-MSA)computation to learn the cross-window interactions.The masking mechanism was utilized to isolate the invalid information exchange between pixels in non-adjacent regions in the original feature map.The better performance was achieved in the dense prediction and high-resolution vision,compared with the traditional ViT architecture.The improved model was obtained to effectively capture the global dependencies with the high computational efficiency.The global modelling capability was then improved the detection efficiency of key point under the occlusion condition.Secondly,RepGFPN,an efficient layer aggregation network with hopping structure and cross-scale connectivity,was adopted as the neck network,where the P6 detection layer was additionally added into the multi-scale output of the backbone network.CspStage module was adopted with the reparameterized ideas and layer aggregation connectivity to fuse the high-level semantic information and the low-layer spatial information,in order to enhance the model multi-scale detection.Thirdly,the pyramid convolution was introduced with 4-layer pyramid structure to replace the standard 3×3 convolution,in order to further optimize the neck network.The bottom-up layer-by-layer increasing convolution kernel was utilized to adaptively adjust the receptive field in the pyramid convolution.The number of model parameters was reduced to effectively capture the feature information of different layers.Finally,the decoup

关 键 词:拖拉机 深度学习 检测 驾驶员 YOLO-Pose 关键点 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术] S24[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象