复杂中文报纸的版面分析、理解和重构  被引量:12

Analysis, understanding and representation of Chinese newspapers with complex layout

在线阅读下载全文

作  者:陈明[1] 丁晓青[1] 梁健 

机构地区:[1]清华大学电子工程系,北京100084

出  处:《清华大学学报(自然科学版)》2001年第1期29-32,59,共5页Journal of Tsinghua University(Science and Technology)

基  金:国家"八六三"高技术项目!(86 3-30 6 -0 3-0 5 -6 );国家自然科学基金资助项目!(6 96 82 0 0 3)

摘  要:在将纸张介质的文档自动转换成电子文档格式的过程中 ,版面的分析、理解和重构是十分关键的问题。针对复杂中文报纸版面 ,提出了一个基于最近邻连接强度和行列可信度的自底向上的版面分析算法和一个基于规则的块生长的版面理解算法 ,并讨论版面重构的相关问题和实现。综合这些算法并结合汉字识别核心 ,实现了一个完整的自动电子出版物制作系统。Layout Analysis, understanding and representation are important problems when transforming paper documents to electronic versions. A bottom up algorithm of layout analysis based on nearest neighbor connect strength and line confidence is proposed for Chinese newspapers with complex layouts. We also propose a rule based grow the algorithm for layout understanding. The implementation of layout representation is also discussed. These algorithms with a Chinese character recognition engine were used to finish a complete system to automatically do electronic publishing. The algorithms were proven be efficient and practical by experiment results and a practical operating system.

关 键 词:版面分析 版面理解 版面重构 中文报纸 

分 类 号:G237.6[文化科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象