基于LL(1)文法的印刷体数学公式结构分析方法  被引量:4

Structural analysis of printed mathematical expressions using LL(1) grammar

在线阅读下载全文

作  者:吴微[1] 侯利昌[1] 

机构地区:[1]大连理工大学应用数学系,辽宁大连116024

出  处:《大连理工大学学报》2006年第3期454-459,共6页Journal of Dalian University of Technology

基  金:国家自然科学基金资助项目(19971012);国防科工委国防基础科研基金资助项目(J1700B002);辽宁省学科带头人基金资助项目

摘  要:当前的OCR(optica l character recogn ition)系统对手写、打印文本都有很高的识别率,但是缺少对数学公式的结构进行分析及重组的功能.为此,将程序设计语言编译程序的基本设计方法用于数学公式的结构分析.重点介绍了上下标的定位、基于LL(1)文法的表达式构成规则和公式结构分析器的设计,并简略介绍了基于神经网络的数学符号识别方法.对于印刷体科学文献中的数学表达式,先通过预处理和分类过程识别每一个数学符号,得到按左边界排序的一串字符.然后通过结构分析器,进行上下标的定位以及前后关系的确定.最后把结构分析器生成的语法树转换成可编辑的L aT ex格式.实例证明得到了比较满意的结果.The current optical character recognition (OCR) has the high efficiency of indentification for the handwriting and the printed texts, but it hasn't the function to analyse and recombine the mathematical expressions. A method of understanding mathematical expressions by the basic design method of programmig is proposed. Mainly discussed here are the method of locating superscripts and subscripts, the LL(1) grammar structure of mathematical expressions, and the structure analyzer. The recognition process is briefly described using neural networks. To understand the mathematical expressions in a printed scientific document, the pretreatment, character segmentation and recognition are performed, ending up with a series of characters sorted by left border. Then a structure analyzer is used to determine the location of subscripts and superscripts and the relative positions. Finally, the grammar tree produced by the structure analyzer is transfered into a LaTex document. Quite satisfactory experimental results were obtained.

关 键 词:公式重构 结构分析 模式识别 LL(1)文法 神经网络 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象