基于字符注意力的自然场景文本识别被引量：2

Natural scene text recognition based on character attention

作　　者：熊炜[1,2,3] 孙鹏赵迪刘粤 XIONG Wei;SUN Peng;ZHAO Di;LIU Yue(School of Electrical and Electronic Engineering,Hubei University of Technology,Wuhan,Hubei 430068,China;Xiangyang Industrial Research Institute,Hubei University of Technology,Xiangyang,Hubei 441003,China;Department of Computer Science and Engineering,University of South Carolina,Columbia,SC 29201,USA)

机构地区：[1]湖北工业大学电气与电子工程学院,湖北武汉430068 [2]襄阳湖北工业大学产业研究院,湖北襄阳441003 [3]美国南卡罗来纳大学计算机科学与工程系,南卡哥伦比亚29201

出　　处：《光电子．激光》2023年第11期1158-1167,共10页Journal of Optoelectronics·Laser

基　　金：国家自然科学基金(61571182,61601177);湖北省自然科学基金(2019CFB530);湖北省科技厅重大专项(2019ZYYD020);襄阳湖北工业大学产业研究院科研项目(XYYJ2022C05);国家留学基金(201808420418)资助项目。

摘　　要：自然场景文本识别中采用固定大小的卷积核提取视觉特征,后仅进行字符分类的方法,其全局建模能力弱且忽视了文本语义建模的重要性,因此,本文提出一种基于字符注意力的自然场景文本识别方法。首先构建不同于卷积网络的多级efficient Swin Transformer提取特征,其可使不同窗口的特征进行信息交互;其次设计了字符注意力模块(character attention module,CAM),使网络专注于字符区域的特征,以提取识别度更高的视觉特征;并设计语义推理模块(semantic reasoning module,SRM),根据字符的上下文信息对文本序列进行建模,获得语义特征来纠正不易区分或模糊的字符;最后融合视觉和语义特征,分类得到字符识别结果。实验结果表明,在规则文本数据集IC13上识别准确率达到了95.2%,在不规则的弯曲文本数据集CUTE上达到了85.8%,通过消融及对比实验证明了本文提出的方法可行。In natural scene text recognition,a fixed size convolution kernel is used to extract visual features,and then character classification is performed.The global modeling ability of this method is weak and it ignores the importance of text semantic modeling.Therefore,this paper proposes a natural scene text recognition method based on character attention.Firstly,a multi-level efficient Swin Transformer network is constructed to extract features,which is different from the convolutional network.This network can make the features of different windows interact with each other.Secondly,the character attention module(CAM) is designed to make the network focus on the features of the character region,so as to extract the visual features with higher recognition ability.Then,the semantic reasoning module(SRM) is designed to model the text sequence according to the context information of characters.And the module can obtain semantic features to correct the indistinguishable or fuzzy characters.At last,visual and semantic features are fused to get the results of character recognition.The experimental results show that the recognition accuracy in this paper reaches 95.2% on the regular text data set IC13 and 85.8% on the irregular curved text data set CUTE.The feasibility of the proposed method is proved by ablative and comparative experiments.

关键词：Swin Transformer 字符注意力语义推理特征融合

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于字符注意力的自然场景文本识别被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于字符注意力的自然场景文本识别 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于字符注意力的自然场景文本识别被引量：2