检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:任锐 王晓娅 文成玉[1] REN Rui;WANG Xiaoya;WEN Chengyu(College of Communicating Engineering,Chengdu University of Information Technology,Chengdu 610225,China)
机构地区:[1]成都信息工程大学通信工程学院,四川成都610225
出 处:《成都信息工程大学学报》2025年第1期1-6,共6页Journal of Chengdu University of Information Technology
基 金:四川省科技计划资助项目(2023YFS0422)。
摘 要:现实场景中存在图像扭曲、背景复杂、弯曲倾斜等不规则文字形状,提取其中的文字信息可提高图像的语义信息和帮助分析上下文,从而更好地理解场景图像。针对场景文本的复杂问题,提出基于CRNN(卷积循环神经网络)改进的端到端场景文本识别技术。在卷积网络层提取特征,基于GoogLeNet改进的inception结构,加入多分支卷积层对多尺度特征的融合,其次融入注意力机制,在通道维度和空间维度加强特征联系,使局部特征拥有全局性。在循环网络层采用Bi-LSTM(双向长短期记忆网络)加强字符之间的上下文联系进行序列预测,最后将预测序列传入CTC(时序分类层)进行转录后序列输出。在IIIT5K数据集和百度中文街景数据集上的实验结果表明,该方法分别获得了95.3%和91.1%的准确率,证明其可靠性。In real-world scenarios,there are complexities such as image distortion,background clutter,bending,and tilting that can cause irregular text shapes.Extracting textual information from these images can enhance their semantic content and help analyze the context,thus better facilitating understanding of the scene.To address these challenges in scene text recognition,an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Net-work)is proposed.In the convolutional network layer,an improved inception structure based on GoogLeNet is used to extract features.This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features.Ad-ditionally,an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimen-sions,giving local features a global perspective.In the recurrent network layer,Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction.Final-ly,the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence out-put.Experimental results on the ITT5K dataset and Baidu’s Chinese Street View dataset demonstrate the reliability of this approach,with accuracy rates of 95.3%and 91.1%respectively.
关 键 词:文本识别 卷积神经网络 注意力机制 双向长短期记忆
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.205