Web信息抽取和展现系统的设计与实现被引量：1

Design and Implementation of Web Information Extraction and Visualization System

出　　处：《电力信息化》2012年第2期23-26,共4页Electric Power Information Technology

摘　　要：随着计算机网络技术的高速发展,如何高效准确地识别和获取Web信息变得至关重要。文章介绍了一个完整的Web信息抽取和展现系统,其总体架构由Web网站集、抽取规则库、内容定制模块和内容展现模块4部分组成。该系统支持用户通过可视化交互式界面定制信息抽取规则,实现了用户个性化抽取规则的存储。在数据项定位方式上采用基于DOM树和分层区域划分的方法,结合父子结点信息进行数据校验,既可以快速定位到信息抽取的目标区域,又能有效保证抽取方法的精度。With the rapid development of computer network technologies,it is of critical importance to efficiently and accurately recognize and acquire web information.This paper describes a complete Web information extraction and visualization system,which consists of four components,web sites,extraction rules repository,content customization module and content displaying module respectively.The system supports the user to customize and save information extraction rules through visual and interactive interfaces.And the data items searching method used in the system integrates the DOM tree based method with the hierarchical area partition method,and validate data by combining parent node information with child node information,which can not only quickly navigate to information extraction target area,but also effectively guarantee the accuracy of extraction methods.

关键词：WEB信息抽取抽取规则 HTML DOM树

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web信息抽取和展现系统的设计与实现被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web信息抽取和展现系统的设计与实现 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

Web信息抽取和展现系统的设计与实现被引量：1