PDF文件中关键信息的提取与组织方法研究被引量：12

Extracting key information from PDF files

出　　处：《计算机工程与设计》2007年第7期1688-1690,共3页Computer Engineering and Design

摘　　要：在PDF的各种应用中,对于文档的理解与处理是非常重要的。首先要从文档中提取相关的关键词和短语,以便于在文档内部或外部建立超链接,方便建立电子文档。因此提出了一种新的方法,将关键信息(关键性的单词、词组或区域)从PDF文件中提取出来,经过组织后,保存在称为KIU的文件中,这样可以在实际上不接触PDF文件的情况下,自动生成超链接。分区域的方法有利于提取过程,找到文本的位置和范围后,可以借助于光学字符识别(OCR)软件来提取文本中的关键性词语或词组。For a variety application of PDF, document processing and understanding is important. The first step towards this process often involves the extraction of relevant key information from the document so that they are automatically hyperlinked within and outside the document so that an electronic document is created. A new method for extracting key information from PDF files is presented. The information is organized and kept in some KIU file. Thus the hyperlinking is done automatically without actually touching the PDF files. Domain specific knowledge about the document is used to aid the extraction process. Once the location and extent of the texts are found, significant keywords or phrases are extracted with the help of Optical Character Recognition （OCR） software.

关键词：PDF文件关键信息文本提取标准通用置标语言超链接

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PDF文件中关键信息的提取与组织方法研究被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PDF文件中关键信息的提取与组织方法研究 被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

PDF文件中关键信息的提取与组织方法研究被引量：12