基于跨模态相似度学习的端到端不规则文本检索方法  

End-to-End Irregular Text Retrieval via Cross-model Similarity Learning

在线阅读下载全文

作  者:李岩 张敏艺 宿汉辰 李芳芳[3] 李斌阳[1] LI Yan;ZHANG Minyi;SU Hanchen;LI Fangfang;LI Binyang(School of Cyber Science and Engineering,University of International Relations,Beijing 100191,China;Advertising Institute,Communications University of China,Beijing 100024,China;School of Computer Science and Engineering,Central South University,Changsha 410083,China)

机构地区:[1]国际关系学院网络空间安全学院,北京100191 [2]中国传媒大学广告学院,北京100024 [3]中南大学计算机学院,湖南长沙410083

出  处:《无线电工程》2023年第3期501-507,共7页Radio Engineering

基  金:国家自然科学基金(61976066);北京市自然科学基金(4212031);湖南省自然科学基金(2021JJ30870);国际关系学院国家安全高精尖学科建设科研专项(2019GA43,2021GA07)。

摘  要:场景文本检索是指从场景中搜索并定位与给定文本相同或相似的文本实例。通过计算机视觉方法实现文本检索可以辅助用户在指定场景中自动找到感兴趣文本,因此被广泛应用于图像安全性审核、图书检索等领域。然而,在某些场景中文本时常呈现弯曲、压缩和拉伸等不规则形态,文本区域提取与匹配面临极大挑战。为了解决这一问题,建立了一个端到端网络模型,将不规则文本提取和跨模态相似度学习统一到一个框架内,利用学习到的相似度对检测的文本实例排序,从而实现对不规则文本的检索。在SVT,STR和CTR三个数据集的实验结果表明,与现有文本检索方法相比,提出的框架在推理速度保持3.7帧/秒的情况下平均准确率比现有最好方法提升1%~3%。为了进一步验证所提方法对于不规则文本检索的有效性,建立了一个新的不规则文本数据集AIDATA,并与STR-TDSL方法进行对比实验,结果表明,在推理速度降低不到20%的情况下可以将平均准确率提升25%以上。Scene text retrieval refers to search for text instances to a given text in a particular scene in order to help users find the text they are interested.This technology is very important in product image retrieval,book retrieval and other applications.However,text in scenes often presents irregular shapes such as bending,compression and stretching,which makes the extraction and matching of text regions face great challenges.In order to solve this problem,an end-to-end network model is established,which jointly optimizes the scene irregular text detection and cross-modal similarity learning,and uses the learned similarity to order detected text instances to achieve better retrieval results.The experimental results on the three datasets of SVT,STR and CTR show that,the framework proposed has an average accuracy improvement of 1%~3%compared with the existing best text retrieval methods while the inference speed is kept at 3.7 frame per second.In order to further verify the effectiveness of this method for irregular text retrieval,a new irregular text dataset AIDATA is established and compared with the STR-TDSL method.The results show that the average accuracy can be improved by more than 25%with the inference speed reduced by less than 20%.

关 键 词:场景文本检索 端到端训练 不规则文本 相似度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象