检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:尹玲 李家乐 黄勃[1] YIN Ling;LI Jiale;HUANG Bo(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201600,China)
机构地区:[1]上海工程技术大学电子电气工程学院,上海201600
出 处:《软件导刊》2025年第2期146-154,共9页Software Guide
基 金:国家自然科学基金青年基金项目(61802251)。
摘 要:针对当前基于深度学习的版面分析方法存在效率低和训练成本高的问题,提出一种基于YOLOv5s改进的单阶段目标检测网络RCW-YOLO,并将其应用于文档图像版面分析任务。首先,通过Res2Net模块改进YOLOv5s中的C3模块,有效增强网络对文档图像多尺度特征的提取能力;其次,引入轻量级上采样算子CARAFE以优化特征融合网络,减少上采样过程中的信息丢失;最后,引入WIoUv3作为边界框回归损失函数,制定合适的梯度权益分配策略,以提升模型泛化能力和整体性能。实验结果表明,在CDLA、IIIT-AR-13K和PubLayNet数据集上,RCW-YOLO在mAP@0.50:0.95指标上分别达到了87.2%、76.4%和94.5%,优于现有的两阶段算法和其他单阶段算法,同时具有更低的计算量、参数量和更快的推断效率。Addressing the issues of low efficiency and high training costs in current deep learning based layout analysis methods,this paper proposes a single-stage tar-get detection network RCW-YOLO,which is improved based on YOLOv5s.Firstly,by improving the C3 module in YOLOv5s with the Res2Net module,the network's ability of extracting multi-scale features from document images is improved.Secondly,the lightweight up-sampling operator CARAFE is used to optimize the feature fusion network,and to reduce information loss during the upsampling process.Finally,WIoUv3 is adopted as the bounding box regression loss function,assigning more attention weights to samples of average quality to improve the model's generalization ability and overall performance.Experimental results show that RCW-YOLO achieves 87.2%,76.4%,and 94.5%in mAP@0.50:0.95 on the CDLA,IIIT-AR-13K,and PubLayNet datasets,respectively.Compared with existing two-stage and other single-stage algorithms,RCW-YOLO has lower computational complexity and parameter count while maintaining excellent accuracy.
关 键 词:文档图像版面分析 目标检测 YOLOv5s 多尺度特征提取
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7