一种基于信息熵迁移的文本检测模型自蒸馏方法  

Self-distillation via Entropy Transfer for Scene Text Detection

在线阅读下载全文

作  者:陈建炜 杨帆 赖永炫[3] CHEN Jian-Wei;YANG Fan;LAI Yong-Xuan(School of Aerospace Engineering,Xiamen University,Xiamen 361005;Shenzhen Research Institute,Xiamen University,Shenzhen 518057;School of Informatics,Xiamen University,Xiamen 361005)

机构地区:[1]厦门大学航空航天学院,厦门361005 [2]厦门大学深圳研究院,深圳518057 [3]厦门大学信息学院,厦门361005

出  处:《自动化学报》2024年第11期2128-2139,共12页Acta Automatica Sinica

基  金:科技创新2030——“新一代人工智能”重大项目(2021ZD0112600);国家自然科学基金委员会面上项目(62173282,61872154);广东省自然科学基金(2021A1515011578);深圳市基础研究专项面上项目(JCYJ20190809161603551)资助。

摘  要:前沿的自然场景文本检测方法大多基于全卷积语义分割网络,利用像素级分类结果有效检测任意形状的文本,其主要缺点是模型大、推理时间长、内存占用高,这在实际应用中限制了其部署.提出一种基于信息熵迁移的自蒸馏训练方法(Self-distillation via entropy transfer,SDET),利用文本检测网络深层网络输出的分割图(Segmentation map,SM)信息熵作为待迁移知识,通过辅助网络将信息熵反馈给浅层网络.与依赖教师网络的知识蒸馏(Knowledge distillation,KD)不同,SDET仅在训练阶段增加一个辅助网络,以微小的额外训练代价实现无需教师网络的自蒸馏(Self-distillation,SD).在多个自然场景文本检测的标准数据集上的实验结果表明,SDET在基线文本检测网络的召回率和F1得分上,能显著优于其他蒸馏方法.Most of the state-of-the-art text detection methods in natural scenes are based on full convolutional network,which can effectively detect arbitrary shape text by using the pixel level classification results from the segmentation network.The main defects of these methods,i.e.large size of the networks,time-consuming forward reasoning and large memory occupation,hinder their deployment in practical applications.In this paper,we propose selfdistillation via entropy transfer(SDET),which takes the information entropy of the segmentation map(SM)output by the deep layers of the text detection network as the knowledge to be transferred,and feeds it directly back into the shallow layers through an auxiliary network.Different from traditional knowledge distillation(KD)which relies on teacher network,SDET utilizes an auxiliary network in the training stage and realizes self-distillation(SD)at a small extra training cost.Experiments conducted on multiple standard datasets for natural scene text detection demonstrate that SDET significantly improves the recall rate and F1 score of the baseline text detection networks,and outperforms other distillation methods.

关 键 词:自然场景 文本检测 知识蒸馏 自蒸馏 信息熵 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象