基于多头注意力机制与长短期记忆网络的自然场景文本识别  

Natural Scene Text Recognition based on Multi-Head Self Attention and Long Short-Term Memory Network

在线阅读下载全文

作  者:姚炜[1] 冯宪伟[1] YAO Wei;FENG Xianwei(Office of Industry Education Integration,Jiangsu Vocational Institute of Commerce,Nangjing Jiangsu 211168,China)

机构地区:[1]江苏经贸职业技术学院产教融合办公室,江苏南京211168

出  处:《传感技术学报》2024年第12期2107-2112,共6页Chinese Journal of Sensors and Actuators

基  金:2024年度江苏省教育科学规划重点课题项目(B-b/2024/02/116);2024年度江苏省教育科学规划重点课题项目(B-b/2024/02/116)。

摘  要:随着计算机视觉和自然语言处理技术的不断发展,自然场景文本检测与识别技术已成为计算机视觉领域的研究热点之一。提出了一种基于多头注意力机制与长短期记忆网络(LSTM)的自然场景文本检测与识别方法。该方法通过结合目标检测算法和序列识别算法,利用多头注意力机制对图像中的文本区域进行精确的定位和特征提取,进而通过LSTM网络对提取的特征进行编码和解码,实现对自然场景中文本的准确识别。在文本检测阶段,采用基于深度学习的目标检测算法,结合多头注意力机制,通过并行计算多个独立的注意力头来捕获图像中不同尺度和方向上的文本信息,提高文本检测的准确性和鲁棒性。在文本识别阶段,利用LSTM网络对检测到的文本区域进行序列建模,通过编码和解码过程将图像中的文本信息转化为可读的字符序列。实验结果表明,所提出的方法在自然场景文本检测与识别任务上取得了优异的性能。与现有的方法相比,所提出的方法在准确性和鲁棒性方面均有所提升,尤其是在处理复杂背景和多样化文本时表现出更好的适应性。With the continuous development of computer vision and natural language processing technologies,natural scene text detection and recognition has become one of the research hotspots in the field of computer vision.A natural scene text detection and recognition method based on multi-head attention mechanism and long short-term memory(LSTM)network is proposed.The method combines object detection algorithms and sequence recognition algorithms to precisely locate and extract features of text regions in images by using a multi-head attention mechanism.Then,the extracted features are encoded and decoded by using LSTM network to achieve accurate rec-ognition of text in natural scenes.In the text detection stage,a deep learning-based object detection algorithm is used,combined with a multi-head attention mechanism,to capture text information of different scales and orientations in the image by parallel computing multi-ple independent attention heads,thereby improving the accuracy and robustness of text detection.In the text recognition stage,LSTM network is used to model the detected text regions and converts text information in the image into readable character sequences through the encoding and decoding process.Experimental results show that the method proposed achieves excellent performance in natural scene text detection and recognition tasks.Compared with existing methods,the proposed method has improved accuracy and robustness,espe-cially in handling complex backgrounds and diverse text.

关 键 词:文本检测与识别 多头注意力机制 自然场景文本 长短期记忆网络 

分 类 号:TN911.73[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程] TP391.43[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象