基于多引擎的印刷体汉字识别系统的设计  

Development of Multi-engine Printed Chinese Character Recognition System

在线阅读下载全文

作  者:梁莹[1] 肖健[1] 李玥[1] 

机构地区:[1]广西计算中心,广西南宁530022

出  处:《广西科学院学报》2011年第4期317-319,共3页Journal of Guangxi Academy of Sciences

摘  要:设计一种基于多引擎的印刷体汉字识别系统,优先采用汉王光学字符识别(OCR)引擎的版面分析结果,在汉王、清华OCR引擎分别完成字符识别之后,根据字符的图像坐标,整合两者的识别结果,并用彩色突出两OCR引擎的冲突字符、置信度低的字符及WiseCheck语义校对引擎提示的错误字符。该系统改善了现有大规模数字化加工生产线中人工比照图像时对识别文本逐字、全文遍历式校对的工作模式,能减轻劳动强度,提高工作效率,降低处理成本。A printed Chinese characters recognition system based on multi-engine has been constructed.Basing on the HW-OCR engine's layout analysis,the HW-OCR and TH-OCR engines accomplished character recognition respectively.According to the coordinate of the character image,the system will integrate the two OCR engine's recognition results using different colors to highlight their conflict character and low confidence character,and the other wrong words which are checked by the "WiseCheck"(a semantic collation engine).This system has improved the text verbatim identification by artificial contrast image and full-text search proofreading work mode in the existing mass digitization processing production line,which further can reduce labor intensity,improve work efficiency and reduce the cost of processing.

关 键 词:汉字识别 光学字符识别 语义校对 多引擎 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象