PDF阅读器的设计与实现被引量：10

Design and implementation of PDF reader

出　　处：《计算机工程与设计》2010年第7期1635-1638,共4页Computer Engineering and Design

摘　　要：为有效提取PDF(portable document format)文件中的文字、图片、图形信息,提出了包含文件预处理、显示预处理、功能扩展、显示4个单元的PDF阅读器的实现模型。基于PDF文件结构特点,提出了忽略次要信息定位关键位置的解析思路。在此基础上,针对FlateDecode、DCTDecode和CCITTFaxDecode这3种过滤器处理的数据流,给出了详细的解决方案,然后对PDF页面内容进行两次解析,设计相应的文字图形等数据结构保存结果,最后对数据利用和功能扩展进行了讨论。通过实验结果表明,该模型能较好地实现PDF信息提取和显示,有利于PDF在中文信息处理领域中的进一步开发利用。To extract the text,images and graphical information from PDF file validly,an implementation model including four units（file pretreatment,display pretreatment,function extension and display） is raised.Based on the structure of PDF file,a solution of ignoring secondary message and positioning key information is put forward.On this basis,a solution to the data stream processed by FlateDecode,DCTDecode and CCITTFaxDecode filters is presented.After analyzed PDF pages twice,corresponding data structure of text and graphical are designed to record the results.At last the data utilization and function extension are discussed.The model can implement the extraction and display of information in PDF file well by experimental comparison,and it will benefit the further deve-lopment of PDF in the field of Chinese information processing.

关键词：可移植文档格式阅读器文件解析图像提取信息处理

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PDF阅读器的设计与实现被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PDF阅读器的设计与实现 被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

PDF阅读器的设计与实现被引量：10