检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]河北大学物理科学与技术学院,河北保定071002
出 处:《光学技术》2007年第1期79-82,共4页Optical Technique
基 金:河北省自然科学基金资助项目(F2004000132)
摘 要:提出了一种光学公式识别与分析的新方法,在公式符号提取与识别中采用RL(Run_length)特征以提高识别率。采用二层连通区域搜索算法提取公式符号的图像,其中第一层为基于RL特征的符号提取,得到复合符号的整体连通区域;第二层为传统搜索方法,进一步确定这些复合符号中包含的单一符号。设计了专门的公式符号识别器,对公式符号进行识别;根据符号间的语义信息和几何关系得到公式的逻辑结构;最终表达为公式结构树。在对印刷文献中所含公式的识别实验中取得了较好的效果,表明该方法具有良好的应用前景。A new method for optical formula recognition and analysis was put forward. The RL features were used in formula extraction and recognition to improve the recognition accuracy. The symbol images were obtained with a two-layers searching algorithm of connected components. In the first layer, the connection areas of composed symbols were extracted with RL features. And the single symbols contained in these composed symbols were identified with a traditional way in the second layer. A special recognizer was designed to identify these symbol images. The logical structure was obtained according to their geometrical features and lingual information. The analysis result was presented as a formula structure tree. The experiments were done on some mathematical expressions within printed document. The results show that the method is of immense practical and theoretical value.
关 键 词:OCR 光学公式识别 符号识别 结构分析 RL特征
分 类 号:TP391.44[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46