基于Transformer的报纸版面分割方法研究  

Research on Newspaper Layout Segmentation Method Based on Transformer

作  者:朱一凡 高华 业宁[1] Zhu Yifan;Gao Hua;Ye Ning(College of Information Science and Technology&Artificial Intelligence,Nanjing Forestry University,Nanjing 210037,China)

机构地区:[1]南京林业大学信息科学技术学院、人工智能学院,江苏南京210037

出  处:《南京师大学报(自然科学版)》2025年第1期109-118,共10页Journal of Nanjing Normal University(Natural Science Edition)

基  金:国家重点研发计划项目(2016YFD0600101)。

摘  要:大数据背景下信息的检索与研究对海量传统纸媒的数字化提出了挑战,得益于不断发展的计算机视觉与人工智能方法,DETR模型可被应用于报纸版面分割.针对原模型在版面分割中存在的检测速度慢、参数量大及分类不精准等问题,本文提出了采用ShuffleNet V2轻量级主干网络的改进模型,该方法可有效提升计算效率并减少模型参数量,从而缓解Transformer结构的计算压力.同时,通过特征金字塔结构,该模型能够充分融合全局信息及细节信息,显著增强多尺度目标的识别能力.此外,该模型还引入高效通道注意力(ECA)模块来提取关键目标特征,以此有效抑制无关背景信息,在保证分割性能的同时实现轻量化设计.实验结果表明,改进模型在报纸版面分割任务中的参数量为38.5 M,帧率(FPS)高达47.5 img/s,mAP_(0.5)达到了0.806.与原DETR模型相比,改进模型在参数量上减少了2.8 M,帧率提高了28.3 img/s,mAP_(0.5)提升了3.2%.本文提出的模型还可以为报纸版面的OCR识别提供前期技术支持.The retrieval and research of information in the context of big data poses a challenge to the digitalization of massive traditional paper media.Thanks to the continuous development of computer vision and artificial intelligence methods,DETR model can be applied to newspaper layout segmentation.In view of the problems existing in the original model in layout segmentation,such as slow detection speed,large number of parameters and inaccurate classification,this paper proposes an improved model using ShuffleNet V2 lightweight backbone network,which can effectively improve computing efficiency and reduce the number of model parameters,thus easing the computing pressure of Transformer structure.At the same time,through the feature pyramid structure,the model can fully integrate the global information and detail information,and significantly enhance the recognition ability of multi-scale targets.In addition,the model also introduces Efficient Channel Attention(EAC)module to extract key target features to effectively suppress irrelevant background information and achieve lightweight design while ensuring segmentation performance.The experimental results show that the parameter number of the improved model is 38.5 M,the frame rate(FPS)is up to 47.5 img/s,and the mAP_(0.5) is up to 0.806.Compared with the original DETR model,the improved model reduces the number of parameters by 2.8 M,increases the frame rate by 28.3 img/s and improves mAP_(0.5) by 3.2%.The model proposed in this paper can provide early technical support for OCR recognition of newspaper layout.

关 键 词:版面分割 DETR ShuffleNet V2 特征金字塔 ECA通道注意力 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象