自然场景文本检测与识别的深度学习方法  被引量:38

Deep learning methods for scene text detection and recognition

在线阅读下载全文

作  者:刘崇宇 陈晓雪[1] 罗灿杰 金连文[1] 薛洋[1] 刘禹良 Liu Chongyu;Chen Xiaoxue;Luo Canjie;Jin Lianwen;Xue Yang;Liu Yuliang(School of Electronics and Information Engineering,South China University of Technology,Guangzhou 510640,China)

机构地区:[1]华南理工大学电子与信息学院,广州510640

出  处:《中国图象图形学报》2021年第6期1330-1367,共38页Journal of Image and Graphics

基  金:国家自然科学基金项目(61936003,61771199);广东省自然科学基金项目(2017A030312006,2021A1515011870)。

摘  要:许多自然场景图像中都包含丰富的文本,它们对于场景理解有着重要的作用。随着移动互联网技术的飞速发展,许多新的应用场景都需要利用这些文本信息,例如招牌识别和自动驾驶等。因此,自然场景文本的分析与处理也越来越成为计算机视觉领域的研究热点之一,该任务主要包括文本检测与识别。传统的文本检测和识别方法依赖于人工设计的特征和规则,且模型设计复杂、效率低、泛化性能差。随着深度学习的发展,自然场景文本检测、自然场景文本识别以及端到端的自然场景文本检测与识别都取得了突破性的进展,其性能和效率都得到了显著提高。本文介绍了该领域相关的研究背景,对基于深度学习的自然场景文本检测、识别以及端到端自然场景文本检测与识别的方法进行整理分类、归纳和总结,阐述了各类方法的基本思想和优缺点。并针对隶属于不同类别下的方法,进一步论述和分析这些主要模型的算法流程、适用场景和技术发展路线。此外,列举说明了部分主流公开数据集,对比了各个模型方法在代表性数据集上的性能情况。最后总结了目前不同场景数据下的自然场景文本检测、识别及端到端自然场景文本检测与识别算法的局限性以及未来的挑战和发展趋势。With the rapid development of internet and mobile internet technologies,many new applications require extensive use of rich text information in natural scenarios,such as sign board recognition and automatic driving.Thus,the analysis and processing of scene text plays an essential role in this field and has increasingly become one of the research hotspots in the field of computer vision.Traditional text detection and recognition methods often rely on manually designed features,with large amount of computation and low efficiency.These methods also lack satisfactory generalization performance for complex scenes.With the development of deep learning in recent years,convolutional neural network has made great progress on scene text detection and recognition.These deep learning-based methods outperform traditional ones by a large margin and have already become the mainstream in the field of text reading in the wild.For scene text detection,the methods can be divided into two categories in accordance with the difference of target objects,as follows:top-down methods and bottom-up methods.Top-down methods mainly inherit the basic idea from general object detection or instance segmentation and directly regress the entire bounding box for the text instance.On the contrary,bottom-up methods,following the idea of traditional ones,initially detect some components of the text instance and then group them together through some rules.Bottom-up methods is more effective in processing text detection of arbitrary shapes and orientations than the top-down methods,and they are not as sensitive to text scaling as top-down methods.However,grouping the detected components into different text instances requires complex design and processing;thus,the inference stage of bottom-up approach becomes inefficient.These methods also encounter some difficulties when detecting long text.In addition,text conglutination occurs when detecting dense text.However,the top-down methods do not have this issue and can have a higher precision for text detect

关 键 词:自然场景文本检测 自然场景文本识别(STR) 端到端自然场景文本检测与识别 深度学习 光学字符识别(OCR) 综述 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象