针对文档图像的非对称式几何校正网络  被引量:4

AsymcNet:a document images-relevant asymmetric geometry correction network

在线阅读下载全文

作  者:秦海 李艺杰 梁桥康[1,2] 王耀南 Qin Hai;Li Yijie;Liang Qiaokang;Wang Yaonan(College of Electrical and Information Engineering,Hunan University,Changsha 410082,China;National Engineering Center of Robot Vision Perception and Control,Hunan University,Changsha 410082,China)

机构地区:[1]湖南大学电气与信息工程学院,长沙410082 [2]机器人视觉感知与控制技术国家工程研究中心,长沙410082

出  处:《中国图象图形学报》2023年第8期2314-2329,共16页Journal of Image and Graphics

基  金:国家重点研发计划资助(2021YFC1910402);国家自然科学基金项目(62073129,U21A20490,62293510);湖南省自然科学基金项目(2022JJ10020)。

摘  要:目的文档图形的几何校正是指通过图像处理的方法对图像采集过程中存在的扭曲、畸变和歪斜等几何干扰进行处理,以提升原始图像的视觉效果与光学字符识别(optical character recognition,OCR)精度。在深度学习普及以前,传统的图像处理方法需要使用激光扫描仪等辅助硬件或在多视角下对文档进行拍摄,且算法的鲁棒性欠佳。深度学习方法构建模型能规避传统算法的不足,但在现阶段这些模型还存在一定的局限性。针对现有算法的缺陷,提出了一种集成文档区域定位与校正的轻量化几何校正网络(asymmetric geometry correction network,AsymcNet),端到端地实现文档图像的几何校正。方法AsymcNet由用于文档区域定位的分割网络和用于校正网格回归的回归网络构成,两个子网络以级联的形式搭设。由于分割网络的存在,AsymcNet对于各种视野下的文档图像均能取得良好的校正效果。在回归网络部分,通过减小输出回归网格的分辨率来降低AsymcNet在训练及推理时的显存耗用和时长。结果在自制的测试数据集中与业内最新的4种方法进行了比较,使用AsymcNet可以将原始图像的多尺度结构相似度(multi-scale structural similarity,MS-SSIM)从0.318提升至0.467,局部畸变(local distortion,LD)从33.608降低至11.615,字符错误率(character error rate,CER)从0.570降低至0.273。相比于业内效果较好的DFE-FC(displacement flow estimation with fully convolutional network),AsymcNet的MS-SSIM提升了0.036,LD降低了2.193,CER降低了0.033,且AsymcNet处理单幅图像的平均耗时仅为DFE-FC的8.85%。结论实验验证了本文所提出AsymcNet的有效性与先进性。Objective Electronic entry of paper documents is normally based on optical character recognition(OCR)technology.A commonly-used OCR system consists of four sequential steps:image acquisition,image preprocessing,character recognition,and typesetting output.The acquired digital image will have a certain degree of geometric distortion because paper document may not be parallel to the plane where the image acquisition device is located.The lens of the image acquisition device may have its own problem of distortion,or its paper document may challenge for deformation.Image acquisition problems of interferences and distortions will be more severe when handheld image capture devices are used(e.g.,mobile phone cameras).Computer vision-oriented highly robust correction algorithms are focused on removing geometric distortions derived from imaging process of paper documents.Currrent researches are concerned about neural networks-based geometric correction of document images.Compared to traditional geometric correction algorithms,neural network-based document image correction algorithms have its potential ability in terms of both hardware requirements and algorithm implementation.However,it is still challenged for optimizing processing performance,especially for the contexts of offline and light weight.To improve the visual effect and OCR recognition accuracy of the original image,geometric correction of document graphics can be used to handle distortion,aberration,skew,and other related image-capturing geometric perturbations.Conventional image processing methods are required for such auxiliary hardware like laser scanners or multiple views-captured documents,and the algorithms can not be robusted.The emerging deep learning methods can be used to optimize traditional algorithms via modeling,but these models still have certain limitations.So,we develop a lightweight geometric correction network(AsymcNet),for which an integrated document region localisation and correction method can be oriented to implement geometric correctio

关 键 词:图像预处理 几何校正 全卷积网络(FCN) 网格采样 端到端 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象