一种用于自然场景文本识别的多路并行位置关联网络  被引量:1

Multi-path Parallel Location Association Network for Natural Scene Text Recognition

在线阅读下载全文

作  者:陈敏[1,2] 叶东毅 陈羽中[1,2] CHEN Min;YE Dong-yi;CHEN Yu-zhong(College of Computer and Big Data,Fuzhou University,Fuzhou 350116,China;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116,China)

机构地区:[1]福州大学计算机与大数据学院,福州350116 [2]福建省网络计算与智能信息处理重点实验室,福州350116

出  处:《小型微型计算机系统》2023年第4期699-705,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61672158,61972097,U21A204721)资助;福建省科技重大专项专题项目(科教联合2021HZ022007)资助;福建省高校产学研合作项目(2021H6022)资助;福建省自然科学基金面上项目(2020J001494)资助。

摘  要:自然场景文本识别是计算机视觉领域的研究热点之一,在无人驾驶、图像检索、机器人导航等领域具有广泛的应用前景.由于自然场景中的文本图像存在背景复杂、透视失真、过度弯曲等现象,给文本识别带来了巨大的挑战.针对上述问题,本文提出了一种基于多路并行的位置关联网络(Multi-Path Parallel Location Association Network,MPLAN)的自然场景文本识别方法.首先,针对不规则文本图像,MPLAN使用文本矫正网络自适应学习图像变换,从而获得线性排列的文本图像.其次,为了捕获字符间的位置信息,MPLAN提出了位置关联模块,利用序列特征的有序性,通过捕获字符位置信息,以提高序列特征与目标字符的对齐准确度.此外,为了增强字符间的语义相关性,MPLAN提出了基于多路传输思想的并行注意力模块,获取全局语义信息,实现序列特征的上下文通信,从而锁定有效字符的位置.在包括规则文本、不规则文本在内的六个数据集上的实验结果表明,MPLAN能够有效利用位置信息与全局语义信息解码字符序列,特别是在识别不规则文本上取得了领先的性能.Natural scene text recognition is one of the research hotspots in the field of computer vision,and it has a wide range of application prospects in fields such as unmanned driving,image retrieval,and robot navigation.Because the text images in natural scenes have complex backgrounds,perspective distortion,excessive bending,etc.,it brings huge challenges to text recognition.In response to the above problems,This paper proposes a natural scene text recognition method based on multi-path parallel location association network(MPLAN).First,for irregular text images,MPLAN uses a text correction network to adaptively learn image transformations to obtain linearly arranged text images.Secondly,in order to capture the position information between characters,MPLAN proposes the position association module,which uses the order of sequence features to improve the alignment accuracy between sequence features and target characters by capturing character position information.In addition,in order to enhance the semantic relevance between characters,MPLAN proposes a parallel attention module based on the idea of multiplex transmission to obtainglobal semantic information and realize contextual communication of sequence features,thereby locking theposition of valid characters.Experimental results on six datasets,including regular text and irregular text,showthat compared with the contrast algorithm,MPLAN can effectively use the position information and globalsemantic information to decode character sequences,and achieve the state-of-the-art performance especially inrecognizing the irregular text.

关 键 词:深度学习 场景文本识别 注意力机制 端到端 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象