一种低亮度非均匀光照文档图片快速二值化方法  被引量:7

A fast binarization method for dark and uneven illumination document images

在线阅读下载全文

作  者:王康维 赵磊 黄鑫炎 彭玉发 马思远 范虹伯 WANG Kang-wei;ZHAO Lei;HUANG Xin-yan;PENG Yu-fa;MA Si-yuan;FAN Hong-bo(School of Applied Sciences,Harbin University of Science and Technology,Harbin,Heilongjiang Province 150000,China)

机构地区:[1]哈尔滨理工大学理学院,黑龙江哈尔滨150000

出  处:《光电子.激光》2020年第12期1333-1340,共8页Journal of Optoelectronics·Laser

基  金:大学生创新创业训练项目(201810214035)资助项目。

摘  要:二值化是光学文字识别(OCR)的重要步骤,直接影响到光学文字识别的成功率。目前基于亮度分割局域二值化算法效果好,但是过程复杂、运算耗时。快速二值化算法流程简单、噪声敏感。低亮度图片一般有不可忽略的噪声,并且文字对比度低。为获取低对比度文字,快速二值化算法需对亮度梯度敏感,但是也会导致快速二值化结果文字断裂、丢失、背景噪声大。为实现高质量快速二值化,本文采取非局域均值滤波算法抑制噪声,同时避免过度平滑图片。采用改进的Bradley算法提取低对比度文字,并解决了文字断裂等问题。最后采用膨胀腐蚀算法抑制二值化噪声。本方法适用于非均匀低亮度和高亮度的图片。实验结果表明,本方法在非均匀高亮度下,表现和其他快速二值化算法相同。在非均匀低亮度下,提取文字更多、文字断裂更少、噪声更小。本方法二值化结果的OCR召回率达到了93.5%。Binarization is an important step in optical character recognition(OCR),directly affects the accuracy of OCR.At present,the local binarization algorithms based on luminance segmentation have good effect,complicated process and long elapsed time.The fast binarization algorithms are simple and noise sensitive.Generally,low-luminance images have nonnegligible noise and low contrast of text.In order to obtain low contrast text,fast binarization algorithms need to be sensitive to luminance gradient.However,in the binarization result,luminance gradient sensitivity also leads to nonnegligible background noise,textual breakage and loss.In this paper,for high-quality and fast binarization,non-local mean filtering is adopted to suppress noise and avoid over-smooth.Improved Bradley algorithm is used to extract the low contrast text in order to solve the problem of textual breakage.In the end,dilation algorithm and erosion algorithm are used to suppress the noise of binarization.Our method is suitable for uneven low luminance pictures and uneven high luminance pictures.Experimental results show that our method performs the same as other fast binarization algorithms under uneven high luminance,however,extracts more text with less noise under uneven low luminance,solves the problem of textual breakage.The OCR recall rate of the binarization results of this method reached 93.5%.

关 键 词:模式识别 二值化 文档图片 光照不均匀 Bradley算法 非局域均值滤波 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象