检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:方浩东 鲍敏[1] FANG Haodong;BAO Min(School of Mechanical Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,China)
机构地区:[1]浙江理工大学机械工程学院,浙江杭州310018
出 处:《软件工程》2023年第5期20-23,10,共5页Software Engineering
摘 要:为了解决目标轴承生产企业存在的手写原料表格存储难、二次利用率低、人工登记效率低且失误率高等问题,基于形态学检测原理和在Tesseract-OCR字符识别的基础上,设计一套原料制式表单识别系统。该识别系统可以对手写表格进行二值化、降噪、倾斜校正等预处理,并采用形态学检测对表格框架进行提取,通过动态掩膜及角点检测实现单元格分割,再采用jTessBoxEditor工具训练字库,从而实现对手写表格的识别过程。实验结果表明:识别系统对图片的识别时间仅需6.88 s,准确率达到96%,具有较高的应用价值和实用价值。In order to solve the problems of difficult storage,low secondary utilization,low manual registration efficiency and high error rate of handwritten raw material forms in target bearing enterprises,this paper proposes to design a set of system for recognizing standardized raw material forms based on the principle of morphological detection and Tesseract-OCR character recognition.The recognition system can carry out the preprocessing for the handwritten forms such as binarization,noise reduction and tilt correction,and use morphological detection to extract the form frame.Cell segmentation is realized through dynamic mask and corner detection,and then jTessBoxEditor tool is used to train the word library,so as to realize the recognition process of handwritten form.The experimental results show that the proposed recognition system takes only 6.88 seconds to recognize images,and the accuracy rate reaches 96%,which is of high practical value.
关 键 词:预处理 形态学检测 Tesseract-OCR 表格框架 动态掩膜
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.80