基于残差连接的场景文本识别端到端网络结构优化  被引量:1

End-to-end Network Structure Optimization of Scene Text Recognition Based on Residual Connection

在线阅读下载全文

作  者:黄金星 潘翔[1] 郑河荣[1] HUANG Jin-xing;PAN Xiang;ZHENG He-rong(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310023

出  处:《计算机科学》2020年第8期221-226,共6页Computer Science

基  金:国家自然科学基金(61871350)

摘  要:针对已有文本识别网络由于深度不够而识别准确率较低的问题,文中提出一种改进的端到端文本识别网络结构。首先,将文本作为序列,采用残差模块将文本按列切分成特征向量输入循环层。这种残差结构增加了卷积网络的深度,使网络保持对文本图像的最佳表征能力,实现对文本信息的捕捉。另一方面,残差模块采用堆叠层来学习残差映射,在层数加深的情况下提高了网络的收敛性。然后,采用循环层对这些文本特征序列进行上下文建模,并把建模结果输入Softmax层以获得序列对应标签的预测,实现了对任意长度文本的识别。循环层使用长短时记忆网络学习文本之间的依赖关系,解决长序列训练过程中的“梯度消失”问题。最后,通过最优路径方法进行文本标签转录。该方法找到一条路径使其概率最大,并输出这条路径对应的序列为最优序列。改进的文本识别网络结构增加了深度,提高了文本图像的特征描述能力和在噪声下的稳定性。在多个测试数据集(ICDAR2003,ICDAR2013,SVT和IIIT5K)上将所提算法与已有典型算法进行实验对比分析,结果表明该网络结构能够得到更高的场景文本识别准确率,验证了其有效性。The existing text recognition methods will cause decreased recognition accuracy due to not enough network depth.The paper addresses this issue and proposes an improved end-to-end text recognition network structure.Firstly,the algorithm takes the text as a sequence,and uses the residual module to divide the text into columns for the recurrent layer.This residual structure increases network depth,to maintain the network’s best representation of the text image.It can capture the best feature representation of text images.Meanwhile,the residual module uses the stacked layer to learn the residual mapping to improve the convergence of the network though the number of layers is obviously increased.Secondly,we use the recurrent layer to model the context of these text features,and the modeling results will be taken into the softmax layer to predict corresponding labels,which achieve the recognition of arbitrary length of texts.The recurrent layer uses the Long Short-Term Memory to learn the dependencies between texts and solve the gradient vanishing problem in long sequence training.Finally,text label transcription and decoding are performed by the optimal path method.The method finds a path to maximize its probability,and outputs the sequence corresponding to the path as the optimal sequence.The improved text recognition network structure increases network depth,improves the feature description of text images and the stability under noises.In the experimental part,this paper compares with existing typical algorithms over the multiple test datasets(ICDAR2003,ICDAR2013,SVT and IIIT5K).The experiments show that the network structure can obtain better text recognition accuracy and verify the effectiveness of the proposed network structure.

关 键 词:残差连接 场景文本识别 堆叠层 网络深度 最优路径 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象