分离复杂背景下的文档图像二值化方法  被引量:5

A complex background-related binarization method for document-contextual information processing

在线阅读下载全文

作  者:王红霞[1] 武甲礼 陈德山[2] Wang Hongxia;Wu Jiali;Chen Deshan(School of Computer Science and Artificial Intelligence,Wuhan University of Technology,Wuhan 430061,China;Intelligent Transportation System Research Center,Wuhan University of Technology,Wuhan 430061,China)

机构地区:[1]武汉理工大学计算机科学与人工智能学院,武汉430061 [2]武汉理工大学智能交通系统研究中心,武汉430061

出  处:《中国图象图形学报》2023年第7期2011-2025,共15页Journal of Image and Graphics

基  金:国家青年科学基金项目(51609193)。

摘  要:目的二值化方法的主要依据是像素的颜色和对比度等低级语义特征,辨别出与文字具有相似低级特征的复杂背景是二值化亟待解决的问题。针对文档图像二值化复杂背景分离问题,提出一种分离文档图像复杂背景的二阶段二值化方法。方法该方法分为易误判像素筛选和二值化分割两个处理阶段,根据两个阶段的分工构建不同结构的两个网络,前者强化对复杂背景中易误判像素识别和分离能力,后者着重文字像素准确预测,以此提升整个二值化方法在复杂背景图像上的处理效果;两个网络各司其职,可在压缩参数量的前提下出色完成各自任务,进一步提高网络效率。同时,为了增强文字目标细节处理能力,提出一种非对称编码—解码结构,给出两种组合方式。结果实验在文本图像二值化比赛(competition on document image binarization,DIBCO)的DIBCO2016、DIBCO2017以及DIBCO2018数据集上与其他方法进行比较,本文方法在DIBCO2018中FM(F-measure)为92.35%,仅比经过特殊预处理的方法差0.17%,综合效果均优于其他方法;在DIBCO2017和DIBCO2016中FM分别为93.46%和92.13%,综合效果在所有方法中最好。实验结果表明,非对称编码—解码结构二值化分割的各项指标均有不同程度的提升。结论提出的二阶段方法能够有效区分复杂背景,进一步提升二值化效果,并在DIBCO数据集上取得了优异成绩。开源代码网址为https://github.com/wjlbnw/Mask_Detail_Net。Objective The optical character recognition(OCR)based binarization technique is essential for dealing with complex backgrounds recently.To recognize text in the image faster and more accurately,document information-contextual image binarization is oriented to segment captured color image or generated grayscale image,and a text information onlyinvolved image can be output.Big data-driven massive data storage is requried for changeable text information versions from hard copies to electronic copies.However,huge amount of textual information is still stored on hard copies currently.Traditional textual information is completely still restricted by manpower-input electronic storage devices.Document information-contextual image binarization technique is benefited from information technology-based text information carriers.The learning technique has facilitated the growth of text information of images-relevant binarization.Multiple end-toend convolutional neural network(CNN)models have been applied for the binarization of text images.Method Compared to the traditional threshold-based document image binarization methods,deep learning-based methods melted into semantic distribution characteristics of text pixels,and its performance of CNN-based text information of image-related binarization methods is accurate to a certain extent.However,these methods are still challenged for complex backgrounds-derived text information images in relevance to high false positives and insufficient training data.The network model is easily overfitted,the intermediate network layers are not easily activated during training,and its CNN-based features extraction is still focused on low-level semantic features only.The key of binarization methods is focused on the low-level semantic features such as pixel color and contrast.It is required to leak out words-like low-level features of complex backgrounds.We develop a dual method of binarization to resolve the identifiable problem of document information images in complex scenarios.The method is s

关 键 词:语义分割 U-Net 文档图像识别 二值化 复杂背景 编码—解码结构 多阶段分割 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象