出 处:《中国图象图形学报》2020年第2期311-320,共10页Journal of Image and Graphics
基 金:国家自然科学基金项目(61771347);广东省特色创新类项目(2017KTSCX181);广东省青年创新人才类项目(2017KQNCX206);江门市科技计划项目(江科〔2017〕268号);五邑大学青年基金项目(2015zk11).
摘 要:目的在文档图像版面分析上,主流的深度学习方法克服了传统方法的缺点,能够同时实现文档版面的区域定位与分类,但大多需要复杂的预处理过程,模型结构复杂。此外,文档图像数据不足的问题导致文档图像版面分析无法在通用的深度学习模型上取得较好的性能。针对上述问题,提出一种多特征融合卷积神经网络的深度学习方法。方法首先,采用不同大小的卷积核并行对输入图像进行特征提取,接着将卷积后的特征图进行融合,组成特征融合模块;然后选取Deeplab V3中的串并行空间金字塔策略,并添加图像级特征对提取的特征图进一步优化;最后通过双线性插值法对图像进行恢复,完成文档版面目标,即插图、表格、公式的定位与识别任务。结果本文采用m IOU(mean intersection over union)以及PA(pixel accuracy)两个指标作为评价标准,在ICDAR 2017 POD文档版面目标检测数据集上的实验表明,提出算法在m IOU与PA上分别达到87.26%和98.10%。对比FCN(fully convolutional networks),提出算法在m IOU与PA上分别提升约14.66%和2.22%,并且提出的特征融合模块对模型在m IOU与PA上分别有1.45%与0.22%的提升。结论本文算法在一个网络框架下同时实现了文档版面多种目标的定位与识别,在训练上并不需要对图像做复杂的预处理,模型结构简单。实验数据表明本文算法在训练数据较少的情况下能够取得较好的识别效果,优于FCN和Deeplab V3方法。Objective Document image layout analysis aims to segment different regions on the basis of the content of the page and to identify the different regions quickly.Different strategies must be developed for diverse layout objects owing to varied handling for each type of area.Therefore,document image layout must be first analyzed to facilitate subsequent processing.The traditional method of document image layout analysis is generally based on complex rules.The method of first positioning and post-classification cannot simultaneously achieve the regional positioning and classification of document layout,and different document images need their own specific strategies,thereby limiting versatility.Compared with the feature representation of traditional method,the deep learning model has powerful representation and modeling capabilities and is further adaptable to complex target detection tasks.Proposal-based networks,such as Faster region-convolutional neural networks(Faster R-CNN)and region based fully convolutional network(R-FCN),and proposal-free networks,such as single shot multbox detecter(SSD),you only look once(YOLO),and other representative object-level object detection networks,have been proposed.The application of pixel-level object detection networks,such as fully convolutional networks and a series of Deep Lab networks,enables deep learning technology to make breakthroughs in target detection tasks.In deep learning,object detection techniques at the object or pixel level have been applied in document layout analysis.However,most methods based on deep learning currently require complex preprocessing processes,such as color coding,image binarization,and simple rules,making the model structure complex.Moreover,the document image will lose considerable information due to the complicated preprocessing process,which affects the recognition accuracy.In addition,common deep learning models are difficult to apply to small datasets.To address these problems,this paper proposes a deep learning method for multi-feature
关 键 词:文档图像处理 版面分析 目标检测 深度学习 语义分割
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...