检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]沈阳建筑大学信息与控制工程学院,辽宁沈阳110168
出 处:《沈阳建筑大学学报(自然科学版)》2008年第2期333-336,共4页Journal of Shenyang Jianzhu University:Natural Science
基 金:辽宁省自然科学基金项目(20052006)
摘 要:目的在将纸张文档数字化的过程中,解决中文文档版面信息的自动提取与恢复问题.方法通过搜索连通域,并根据连通域的尺寸特征,优先提取非文本区域,对提取出来的非文本区域,根据投影直方图、宽高比和黑白像素比等特征区分出表格、直线和图像;对文本区域采用改进的基于投影的纵横切割法来达到对文本正确分割的目的;利用XML文档文件格式描述、组织、恢复原有版面的数据和样式.通过重构生成保持原版面格式的通用电子文档,达到"原文重现"的目的.结果对大量的书籍样张和带表格、图像以及横竖混排等复杂样张的试验,结果表明改进的版面分析方法分割准确,速度快;基于XML技术的重构方法实现了对文档版面较精确的重构.结论采用统计特征得出的阈值参数用在了改进的版面分析方法中,提高了系统的适应性.该方法对较规范的文档效果较好,对复杂版面在一定的人工干预下基本可以适用.We try to automatically extract and resume Chinese document layout in the process of converting paper media documents into electronic format. First, non - text region was extracted by searching connected domain, according to the size feature of connected domain. Then extracted non - text region forms, lines and images were distinguished according to characteristics of projection histogram, aspect ratio and the ratio of black and white pixels. The correct segmentation for text region was achieved on the basis of the vertical projection and horizontal - cut method. And the original layout' s data and style were described, organized, and restored by XML document file format. The purpose of resuming the original text can be realized by reconstructing and generating universal electronic document that maintains the format of original layout. The results show that the improved layout analysis has accurate division and faster rate. The reconstruction method based on XML technology achieves more accurate reconstruction for document layout. The threshold parameters, obtained by adopting statistical characteristics, are used in the improved layout analysis methods, which have improved the system adaptability. This method suits standardized document better, and can be applied to complex layouts with certain manual intervention.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.113.183