改进YOLOv8表格行列单元格结构检测  

Improved detection of table row,column and cell structure based on YOLOv8

在线阅读下载全文

作  者:任强[1,2] 玛依热·依布拉音 艾斯卡尔·艾木都拉[1,2,3] REN Qiang;IBRAYIM Mayire;HAMDULLA Askar(School of Computer Science and Technology,Xinjiang University,Urumqi 830017,China;Key Laboratory of Signal Detection and Processing(Xinjiang University),Urumqi 830017,China;School of Future Technology,Xinjiang University,Urumqi 830017,China)

机构地区:[1]新疆大学计算机科学与技术学院,乌鲁木齐830017 [2]信号检测与处理重点实验室(新疆大学),乌鲁木齐830017 [3]新疆大学未来技术学院,乌鲁木齐830017

出  处:《中国科技论文》2024年第5期607-614,共8页China Sciencepaper

基  金:国家自然科学基金资助项目(62166043,U2003207)。

摘  要:当前数字办公文档中涵盖了大量的表格数据,因此智能化表格结构识别需求日益剧增,但表格结构紧密相连且表格结构类型复杂多变,从而导致表格结构检测难度极大。针对该问题,在YOLOv8的基础上,以ICDAR19-cTDaR表格单元格结构和TabStructDB表格行列结构为实验对象,提出了一种新型表格行列单元格结构检测方法。首先,为了增强表格单元格及行列特征提取能力,引入了可变形卷积网络(deformable convolution network,DCN)。其次,引入了空间通道重构卷积(spatial and channel reconstruction convolution,SCConv),该卷积不仅特征提取能力强而且能够减少冗余特征从而降低复杂性和计算成本。根据以上引入的卷积设计了一个新的模块——DSC模块以替代C2f中的Bottlenck模块,并命名为C2fDSC模块。此外,为了进一步加强表格结构的角落局部特征提取,在YOLOv8的骨干网络上加入了显示中心特征调节(explicit visual center feature adjustment,EVC)模块。最后,将原模型的损失函数替换为MPDIoU,在解决密集目标回归精度问题时,相较于原始模型损失函数,MPDIoU损失函数边界框回归的准确性和效率更高。实验结果表明,该表格结构检测算法在数据集ICDAR19-cTDaR上取得了目前最佳的实验效果(SOTA),单元格查准率、查全率和F1值分别为91.7%、82.3%和86.7%,在数据集TabStructDB表格行列检测中也取得了非常实用的性能结果。Current digital office documents involve huge amount of table data.Accordingly,increasing demand on intelligent table recognition emerges.Nonetheless,the table structure is complex and closely linked,leading to the ultrahigh difficulty in table structure detection.To address this problem,a new table row-column cell structure detection method was proposed based on YOLOv8,in which the ICDAR19-cTDaR table cell structure and the TabStructDB table row-column structure were taken as object.Firstly,in order to enhance the extraction of table cells and row and column features,this paper introduced deformable convolution network(DCN).Secondly,the introduction of spatial and channel reconstruction convolution(SCConv)not only had a strong feature extraction capability,but also reduced redundant features to reduce the complexity and computational cost.Based on the above introduced convolution,a new module DSC module was designed to replace the Bottlenck module in C2f and named as C2fDSC module.Additionally,in order to further enhance the corner local feature extraction of the table structure,a explicit visual center feature adjustment(EVC)module was added to the backbone network of YOLOv8.Finally,the loss function of the original model was replaced by MPDIoU.When the problem of dense objective regression accuracy is being solved,the MPDIoU loss function bounding box regression is more accurate and efficient compared to the original model loss function.Experimental results show that the table structure detection algorithm in the dataset ICDAR19-cTDaR achieves the best detection results so far.The cell checking rate,checking rate and F1 value are 91.7%,82.3%and 86.7%,respectively.Moreover,the proposed algorithm also performed well in the dataset TabStructDB table row and column detection.

关 键 词:YOLOv8 EVC模块 C2fDSC模块 MPDIoU损失函数 最佳性能 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象