面向识别的长弯曲文本预处理算法

Preprocessing algorithm for long curve text recognition

作　　者：刘新天冯杰[1] 朱明航马汉杰[1] 郑雅羽[2] LIU Xintian;FENG Jie;ZHU Minghang;MA Hanjie;ZHENG Yayu(School of Computer Science and Technology(School of Artificial Intelligence),Zhejiang Sci-Tech University,Hangzhou 310018,China;College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区：[1]浙江理工大学计算机科学与技术学院(人工智能学院),杭州310018 [2]浙江工业大学信息工程学院,杭州310023

出　　处：《智能计算机与应用》2024年第12期10-17,共8页Intelligent Computer and Applications

基　　金：浙江省科技计划项目(2021C01163)。

摘　　要：光学字符识别(Optical Character Recognition,OCR)是对文本图片进行扫描,然后对图像进行分析处理,获取到其中的文字内容的过程。但是目前的OCR算法对于弯曲的长文本普遍识别效果不佳,为此,提出了一种面向识别的长弯曲文本预处理算法,即在文本行识别之前添加长弯曲文本处理模块(Long Curve Text Processing,LCTP),以提升图像中所有文本行识别的准确率。首先,在进行文本区域检测后,获取单条长弯曲文本行并清除干扰信息;其次,根据单条长弯曲文本行的特征计算每条弯曲文本行的关键拐点;进而,使用关键拐点对单条文本行进行切分和融合;最后,将经过切分与融合后的文本行输入文本行识别模型中得到最终识别结果。通过手动采集长弯曲文本图像形成的数据集Long Curve Text与目前主流OCR框架PP-OCR和Tesseract OCR进行对比实验可知,LA、MED、NED指标均有提升,相比于PP-OCR,LA提升49.5%,MED和NED分别降低了44115和0.182;相比于Tesseract OCR,LA提升3.2%,MED和NED分别降低了30282和0.125。同时,也在Long Curve Text数据集中进行了消融实验以验证本文提出LCTP的有效性以及进行了LCTP各个结构的时间对比实验以验证本文提出LCTP的高效性。结果表明LCTP可以提高长弯曲文本识别准确率,总体上可以地获得更加准确、有效的识别结果。Optical Character Recognition(OCR)is the process of scanning text images,analyzing and processing the images to extract the textual content.However,current OCR algorithms generally have poor performance in recognizing long and curved texts.To address this issue,a pre-processing algorithm called Long Curve Text Processing(LCTP)is proposed,which aims to improve the accuracy of text line recognition in images.Firstly,after performing text region detection,a single long and curved text line is obtained and noise information is removed.Secondly,the key inflection points of each curved text line are calculated based on their features.Subsequently,the text lines are segmented and merged using the key inflection points.Finally,the segmented and merged text lines are fed into a text line recognition model to obtain the final recognition results.A comparative experiment is conducted between the manually collected dataset,Long Curve Text,and the state-of-the-art OCR frameworks,namely PP-OCR and Tesseract OCR.The experiments show improvements in the LA(Localization Accuracy),MED(Minimum Edit Distance),and NED(Normalized Edit Distance)metrics.Compared to PP-OCR,LA is improved by 49.5%,while MED and NED decrease by 44115 and 0.182,respectively.Compared to Tesseract OCR,LA is improved by 3.2%,while MED and NED decrease by 30282 and 0.125,respectively.Additionally,ablation experiments are performed on the Long Curve Text dataset to validate the effectiveness of LCTP,and time comparison experiments are conducted to demonstrate the efficiency of the proposed LCTP structures.The results indicate that LCTP can enhance the accuracy of long and curved text recognition,providing more precise recognition results in general.

关键词：长弯曲文本干扰信息关键拐点切分融合

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向识别的长弯曲文本预处理算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向识别的长弯曲文本预处理算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索