基于模糊分类的印刷体数学公式抽取方法  被引量:2

Mathematical formula extraction method from printed document based on fuzzy classification

在线阅读下载全文

作  者:田学东[1] 郝楠[1] 

机构地区:[1]河北大学数学与计算机学院,河北保定071002

出  处:《计算机应用》2007年第8期2036-2037,2065,共3页journal of Computer Applications

基  金:河北省科学技术研究与发展计划资助项目(06213598)

摘  要:公式抽取是印刷体数学公式识别的基础性环节,现有的识别方法多以公式区域已知为前提,相关的研究还很欠缺。通过引入模糊分类理论,提出了一种孤立数学公式的抽取算法,通过对大量训练样张的数据统计与分析,选取了非规则度、宽高比、密度等6维特征,由此构建出对孤立公式行、文本行、标题行的模糊分类规则,实现了孤立公式行的抽取。实验结果表明,该方法有较高的准确性和鲁棒性。Process of mathematical formula extraction from printed document is a basal step. Most of the available extraction methods assume that the regions containing mathematical formulas are known. An algorithm to extract isolated mathematical formulas by introducing fuzzy classification theory was described. Six features, such as degree of irregularity, width-to-height ratio and density ect, were selected from lots of data that came from training samples counted and analyzed, thereby the rule of fuzzy classification was built to handle isolated mathematical formula lines, text lines and tide lines, so mathematical formula extraction was realized. The experimental results indicate that this method could obtain favorable veracity and good robusmess.

关 键 词:印刷体数学公式识别 公式抽取 模糊分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象