一种基于特征增强的场景文本检测算法  被引量:1

Scene Text Detection Algorithm Based on Feature Enhancement

在线阅读下载全文

作  者:高楠[1] 张雷 梁荣华[1] 陈朋 付政 GAO Nan;ZHANG Lei;LIANG Ronghua;CHEN Peng;FU Zheng(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310023

出  处:《计算机科学》2024年第6期256-263,共8页Computer Science

基  金:国家自然科学基金(61702456,62036009,U1909203);国家重点研发计划(2020YFB1707700)。

摘  要:针对自然场景下图像文本复杂背景、尺度多变等造成的漏检、误检问题,提出了一种基于特征增强的场景文本检测算法。在特征金字塔融合阶段,提出了双域注意力特征融合模块(Dual-domain Attention Feature Fusion Module,D2AAFM)。该模块能够更好地融合不同语义和尺度的特征图信息,从而提高文本信息的表征能力。同时,考虑到网络深层特征图在上采样融合过程中出现语义信息损失的问题,提出了多尺度空间感知模块(Multi-scale Spatial Perception Module,MSPM),通过扩大感受野来获取更大感受野的上下文信息,增强深层特征图的文本语义信息特征,从而有效地减少文本漏检、误检。为了评估所提算法的有效性,在公开数据集ICDAR2015,CTW1500以及MSRA-TD500上进行实验,所提方法综合指标F值分别达到了82.8%,83.4%和85.3%。实验结果表明,该算法在不同数据集上都具有良好的检测能力。To address the problem of missed and false detection of image text in natural scenes due to complex backgrounds and variable scales,this paper proposes a text detection algorithm for scenes based on feature enhancement.In the feature pyramid fusion stage,a dual-domain attention feature fusion module(D2AAFM)is proposed,which can better fuse feature map information of different semantics and scales,thus improving the characterization ability of text information.At the same time,considering the problem of semantic information loss in the process of up-sampling and fusion of deeper feature maps of the network,the multi-scale spatial perception module(MSPM)is proposed to enhance the semantic features of text in higher-level feature maps by expanding the perceptual field to obtain contextual information of a larger perceptual field,thus effectively reduce the text of missed and false detection.In order to evaluate the effectiveness of the proposed algorithm,it is tested on the publicly available datasets ICDAR2015,CTW1500 and MSRA-TD500,and its overall index F-value reaches 82.8%,83.4%and 85.3%,respectively.The experimental results show that the algorithm has good detection capability on different datasets.

关 键 词:深度学习 场景文本检测 注意力机制 多尺度特征融合 空洞卷积 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象