检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:LI Pengchao PENG Liangrui WEN Juan
机构地区:[1]Tsinghua National Laboratory for Information Science and Technology, Dept. of Electronic Engineering, Tsinghua University, Beijing 100084, China [2]Equipment Academy, Beijing 101416, China
出 处:《Chinese Journal of Electronics》2016年第3期520-526,共7页电子学报(英文版)
基 金:supported by the National Basic Research Program ofChina(973 Program)(No.2014CB340506);National Natural Science Foundation of China(No.61261130590,No.61032008);Tsinghua National Laboratory for Information Science and Technology(TNList)Cross-discipline Foundation
摘 要:Although Optical character recognition(OCR) technology has achieved huge progress in recent years, character misrecognition is inevitable. In order to realize high fidelity content of document digitalization,we propose a new Convolutional neural networks(CNN)based confidence estimation method. We detect the misrecognized characters through comparing the confidence value with a preset threshold, so as to leave the recognition errors as embedded images in the output digital documents. We adopted sofmax as the estimation of posteriori probability,overlap pooling and maxout with dropout technologies in CNN architecture design. Experimental results show that our method has achieved an explicit improvement compared to baseline system.Although Optical character recognition(OCR) technology has achieved huge progress in recent years, character misrecognition is inevitable. In order to realize high fidelity content of document digitalization,we propose a new Convolutional neural networks(CNN)based confidence estimation method. We detect the misrecognized characters through comparing the confidence value with a preset threshold, so as to leave the recognition errors as embedded images in the output digital documents. We adopted sofmax as the estimation of posteriori probability,overlap pooling and maxout with dropout technologies in CNN architecture design. Experimental results show that our method has achieved an explicit improvement compared to baseline system.
关 键 词:Optical character recognition(OCR) Confidence estimation Convolutional neural networks(CNN)
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.65