印刷体数学公式识别研究  

Research on Identification of Printed Mathematical Formula

在线阅读下载全文

作  者:林妍然 杨立洪 

机构地区:[1]华南理工大学数学学院,广东 广州

出  处:《数据挖掘》2020年第2期97-110,共14页Hans Journal of Data Mining

摘  要:随着电子书产业的发展,OCR技术(光学字符识别)的应用越来越广泛,但公式识别还未普及,主要原因是公式本身缺乏规律,同时,论文查重越来越严,也给公式识别提出了新的要求。本文主要研究以图片形式呈现的印刷体数学公式识别,字符分割方面采用了投影行分割和连通域分割相结合的方法,字符识别采用了模板匹配法,然后利用字符的相对位置进行结构分析。系统采用的分割方法在保证运算速度的同时提高了分割精度,对可分割的字符效果好。采用的字符识别方法充分利用了印刷体数学字符的规律性,算法简单,运算复杂度较低,识别精度高于97%。结构分析上考虑了常见的公式结构,分类讨论不重不漏。系统能识别清晰的印刷体数学公式,运算复杂度和字符粘连问题还需要进一步优化。With the development of the e-book industry, the application of OCR technology is becoming more and more widespread, but formula recognition has not been popularized. The main reason is the lack of regularity of formula. At the same time, the paper review has become more and more strict, and it has also proposed new formula recognition claim. This paper mainly studies the recognition of printed mathematical formulas in the form of pictures. For character segmentation, a combina-tion of projection line segmentation and connected domain segmentation is used. For character recognition, template matching is adopted, and then the relative positions of characters are used for structural analysis. The segmentation method adopted by the system improves the segmenta-tion accuracy while ensuring the operation speed, and has a good effect on the segmentable char-acters. The adopted character recognition method makes full use of the regularity of printed mathematical characters. The algorithm is simple with the low computational complexity, and the recognition accuracy is higher than 97%. The common formula structure is taken into account in structural analysis, and the classification discussion is without repetition or omission. The system can identify clear printed mathematical formulas, and computational complexity and character sticking problems need to be further optimized.

关 键 词:连通域分割 模板匹配法 结构分析 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象