多网络和多头注意力融合的场景文本识别算法  被引量:1

Scene text recognition algorithm based on multi-network and multi-head attention fusion

在线阅读下载全文

作  者:贾小云[1] 翁佳顺 刘颜荦 Jia Xiaoyun;Weng Jiashun;Liu Yanuo(School of Electronic Information and Artificial Intelligence,Shaanxi University of Science&Technology,Xi'an,Shaanxi 710021,China)

机构地区:[1]陕西科技大学电子信息与人工智能学院,陕西西安710021

出  处:《计算机时代》2023年第8期46-51,共6页Computer Era

摘  要:针对场景文本识别算法忽略整个文本的全局信息内容,提出多网络和多头注意力融合的自然场景文本识别算法。首先使用多网络融合结构,设计多种残差模块在视觉特征中捕获上下文特征和语义特征。其次在字符预测过程中提出多头注意力机制编码器,将位置信息、视觉特征和分类信息拼接成新的特征空间并重新加权。实验结果表明该模型能更好的利用位置特征,全局语义特征和上下文特征更准确地识别到文本内容,提高了模型的准确率。Aiming at the problem that the scene text recognition algorithm ignores the global information content of the entire text,a natural scene text recognition algorithm with multi-network and multi-head attention fusion is proposed.Firstly,the multi-network fusion structure is used to design multiple residual modules to capture contextual features and semantic features in visual features.Then,in the process of character prediction,a multi-head attention mechanism encoder is proposed to stitch position information,visual features and classification information into a new feature space and re-weight them.Experimental results show that the model can make better use of position features,global semantic features and context features to identify text content more accurately,which improves the accuracy of the model.

关 键 词:场景文本识别 多网络融合 多头注意力机制 特征提取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象