基于十字感受野网络的场景文本检测

Scene Text Detection Based on CrossNet

作　　者：赵旭赵朝阳杜晓杰张振清刘松岩郭海云[1] 唐明[1,2] 王金桥 HAO Xu;ZHAO Chaoyang;DU Xiaojie;ZHANG Zhenqing;LIU Songyan;GUO Haiyun;TANG Ming;WANG Jinqiao(Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Railway Police College,Zhengzhou 450053,China;Yunnan University,Kunming 650504,China)

机构地区：[1]中国科学院自动化研究所,北京100190 [2]中国科学院大学,北京100049 [3]铁道警察学院,河南郑州450053 [4]云南大学信息学院,云南昆明650504

出　　处：《无线电通信技术》2021年第3期363-368,共6页Radio Communications Technology

基　　金：公安部技术研究计划项目(2018JSYJC23);中央高校基本科研业务经费项目(2017TJJBKY001);国家自然科学基金项目(62002357,61806200,61772527)。

摘　　要：现有的场景文本检测方法直接采用面向图像分类任务的网络结构。由于文本目标在纵横比、外观纹理及尺寸上与ImageNet上的自然物体的明显区别,这些分类网络结构不适合于场景文本检测任务。为解决该问题,提出了一种适用于场景文本检测的骨干网络——十字感受野网络(CrossNet)。CrossNet的基本元素为十字感受野模块(Cross-Receptive-Field Block,CrossRecepBlock)。考虑到场景文本通常是矩形的,在CrossRecepBlock中,用矩形卷积核代替普通的正方形卷积核来指导网络学习更适合场景文本检测的有效感受野;基于文本检测主干网络的宽度非常重要、深度不宜过大的原则,构建了CrossNet。采用CrossNet的EAST方法在准确率上显著超过基于ResNet-50的原始方法,并在ICDAR2015上达到了82.5%的F-score评测结果。Current scene text detection methods are usually based on networks designed for classification tasks on ImageNet.However,these networks are not suitable for detecting scene texts,since the text instances are quite different from general objects on ImageNet,in aspect ratios,appearance textures,and scales.To remedy this problem,this paper proposes CrossNet,a backbone network tailored for scene text detection.The key of CrossNet is the three-path block,CrossRecepBlock.Considering that scene texts usually appear in the rectangle shape,the CrossRecepBlock utilizes rectangle convolution kernels instead of the common square ones to guide networks in learn efficient features with more suitable receptive fields for scene text detection.This paper also proposes the principle for the backbone of text detection network:the network width is very important while the network depth should not be too large.Based on this principle,we build the CrossNet.The CrossNet-based EAST outperforms the counter-part method based-on ResNet-50 by a large margin,achieving the state-of-the-art of 82.5%in F-score on ICDAR2015.

关键词：场景文本检测卷积神经网络主干网络结构十字感受野网络

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于十字感受野网络的场景文本检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于十字感受野网络的场景文本检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索