聚焦难样本的区分尺度的文字检测方法  被引量:4

Scale differentiated text detection method focusing on hard examples

在线阅读下载全文

作  者:林泓[1] 卢瑶瑶 LIN Hong;LU Yao-yao(College of Computer Science and Technology,Wuhan University of Technology,Wuhan 430063,China)

机构地区:[1]武汉理工大学计算机科学与技术学院

出  处:《浙江大学学报(工学版)》2019年第8期1506-1516,共11页Journal of Zhejiang University:Engineering Science

摘  要:针对卷积神经网络中间特征层信息利用不充分,以及不区分尺度和难易样本的学习所导致的文字检测精度难以提高的问题,提出基于多路精细化特征融合的聚焦难样本的区分尺度的自然场景文字检测方法.构建多路精细化的卷积神经网络融合层提取高分辨率特征图;按照文字标注矩形框的较长边的尺寸,将文字实例划分为3种尺度范围,并分布到不同的候选框提取网络中提取相应的候选框;设计聚焦损失函数对难样本进行重点学习以提高模型的表达能力并得到目标文字框.实验表明,所提出的多路精细化特征提取方法在COCO-Text数据集上的文字召回率较高,聚焦难样本的区分尺度的文字检测方法在ICDAR2013、ICDAR2015标准数据集上的检测精度分别为0.89、0.83,与CTPN、RRPN等方法相比,在多尺度多方向的自然场景图像中具有更强的鲁棒性.The accuracy of text detection is difficult to improve due to the inadequate utilization of the information in middle feature layers of convolutional neural networks and the learning without distinction of different scales and hard-easy examples.Aiming at this problem,a text detection method for natural scene images based on multichannel refined feature fusion was proposed,which focused on hard examples and could distinguish different scales.The fusion layers of multi-channel refined convolutional neural network were constructed to extract high resolution feature maps.According to the size of the longer side of text label rectangle boxes,the text instances were divided into three scale ranges,and distributed to different proposal networks to extract corresponding proposals.The focal loss function was designed to focus on learning hard examples to improve the expressive ability of the model and obtain the target text bounding boxes.Experiments showed that the text recall of the proposed multi-channel refined feature extraction method on COCO-Text datasets was high.The detection accuracies of the differentiated-scale text detection method focusing on hard examples on ICDAR2013 and ICDAR2015 standard datasets were 0.89 and 0.83,respectively.Compared with CTPN and RRPN,the proposed method has stronger robustness in multi-scale and multi-orientation natural scene images.

关 键 词:深度学习 自然场景 文字检测 特征融合 难样本 聚焦损失 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象