基于CNN图像识别与语义可靠性的路径搜索方法  被引量:10

Path Search Method Based on CNN Image Recognition and Semantic Reliability

在线阅读下载全文

作  者:李宇霞 孙永奇[1] 闫茹 朱卫国 LI Yuxia;SUN Yongqi;YAN Ru;ZHU Weiguo(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)

机构地区:[1]北京交通大学计算机与信息技术学院,北京100044

出  处:《计算机工程》2021年第1期255-263,274,共10页Computer Engineering

基  金:国家自然科学基金(61572005,61672086,61272004)。

摘  要:光学字符识别技术可有效提高票据应用中票据信息录入的工作效率。针对票据的复杂背景与不规范手写字符降低票据识别准确率的问题,结合卷积神经网络图像识别与语义可靠性,提出一种可靠性优先的路径搜索方法,以降低模糊字符对搜索路径的干扰。利用基于公司名结构特点的前后缀推断策略,有效解决公司名前后缀识别错误问题。采用结巴中文分词与字符位置信息检查识别结果中的错误,并将长短期记忆语言模型与在传统字形相似度基础上引入的汉字部件相似度相结合进行纠错。实验结果表明,通过将纠错策略与该方法相结合可有效提高公司名识别准确率至93.08%。Optical Character Recognition(OCR)technology can effectively improve working efficiency of bills information input in bills application.To address the problem that the complex background of bills and irregular handwritten characters reduce the recognition accuracy of bills,this paper combines the Convolutional Neural Network(CNN)image recognition and semantic reliability to propose a reliability first path search method to reduce the interference of fuzzy characters on the search path.By using the prefix and suffix inference strategy proposed according to the structural characteristics of the company name,the problem of identifying the prefix and suffix of the company name is effectively solved.Jieba Chinese word segmentation and character position information are used to check the errors in the recognition results,and the Long Short-Term Memory(LSTM)language model is combined with the similarity of Chinese character components introduced based on the traditional character pattern similarity for error correction.The experimental results show that the proposed method combined with error correction strategies can significantly improve the accuracy of company name recognition,which reaches 93.08%.

关 键 词:文本识别 语言模型 卷积神经网络 长短期记忆网络 字形相似度 结巴中文分词 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象