结合Segformer与增强特征金字塔的文本检测方法  

Text detection method combining Segformer with an enhanced feature pyramid

在线阅读下载全文

作  者:张铭泉[1,2] 张泽恩 曹锦纲 邵绪强[1,2] ZHANG Mingquan;ZHANG Zeen;CAO Jingang;SHAO Xuqiang(School of Control and Computer Engineering,North China Electric Power University,Baoding 071003,China;Engineering Research Center of intelligent Computing for Complex Energy Systems Ministry of Education,Baoding 071003,China)

机构地区:[1]华北电力大学控制与计算机工程学院,河北保定071003 [2]华北电力大学复杂能源系统智能计算教育部工程研究中心,河北保定071003

出  处:《智能系统学报》2024年第5期1111-1125,共15页CAAI Transactions on Intelligent Systems

基  金:中央高校基本科研业务费专项资金项目(2021MS092);河北省省级科技计划项目(22310302D).

摘  要:针对自然场景文本检测算法中的小尺度文本漏检、类文本像素误检以及边缘定位不准确的问题,提出一种基于Segformer和增强特征金字塔的文本检测模型。该模型首先采用基于混合Transformer(mix Trans-former,MiT)的编码器生成多尺度特征图;然后,在具有特征金字塔结构解码器的上采样部分,提出级联融合注意力模块,通过全局平均池化、全局最大池化和Ghost模块获取全局通道信息并保留文本特征;接着,在解码器的特征融合部分提出两级正交融合注意力模块,利用非对称卷积分别从水平和垂直方向进行信息增强;最后,利用可微分二值化对结果进行后处理。将本文方法在ICDAR2015、ShopSign1265和MTWI 3个数据集上进行实验,相比于其他8种方法,本文方法的F值均为最优,分别达到了87.8%、59.1%和74.8%。结果表明,本文方法有效提高了文本检测的准确率。To address the issues of small-scale text omission,text-like pixel misdetection,and inaccurate edge localization in text detection algorithms for natural scenes,we propose a text detection model based on Segformer and an enhanced feature pyramid.First,the model employs an MiT-B2-based encoder to generate multiscale feature maps.Subsequently,during the upsampling phase of the decoder,a cascaded fusion attention module is introduced,which acquires global channel information and text features through global average pooling,global max pooling,and ghost convolution.Then,a two-level orthogonal fusion attention module utilizes asymmetric convolution to enhance the information in the feature fusion section horizontally and vertically.Finally,the results are post-processed using differentiable binarization.The experiments were conducted on the ICDAR2015,ShopSign1265,and MTWI datasets.Compared with the other eight methods,the proposed method achieved the highest F-values,reaching 87.8%,59.1%,and 74.8%%,respectively.These results demonstrate that the method effectively improves the accuracy of text detection.

关 键 词:文本检测 特征金字塔 注意力机制 Segformer Ghost模块 多尺度特征融合 平均池化 最大池化 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象