基于本体的Web信息抽取系统被引量：14

Web information extraction system based on ontology

出　　处：《计算机工程与设计》2012年第7期2634-2639,共6页Computer Engineering and Design

基　　金：河南省软科学研究计划基金项目(112400450172);河南省教育厅自然科学基金项目(2009A520027)

摘　　要：为了解决已有信息抽取系统中方法不具有重用性及不能抽取语义信息的问题,提出了一个基于领域本体的面向主题的Web信息抽取框架。对Web中文页面,借助外部资料,利用本体解析信息,对文件采集及预处理中的源文档及信息采集、文档预处理、文档存储等技术进行了分析设计,提出了文本转换中的分词及词表查询和命名实体识别算法,并给出了一种知识抽取方案。实验结果表明,该方法可以得到性能较高的抽取结果。To address the semantic problem and method reuse of traditional information extraction system, a topic-oriented Web information extraction framework based on domain ontology is proposed. For Chinese documents on the Web, with external data and domain ontology, the source document, information collection, document pre-processing, document storage and document database of the document collection and preproeessing are analyzed, the word segmentation, vocabulary queries and named entity recognition algorithms of text conversion is introduced. In the end, a knowledge extraction method is given. The test shows that the method can get higher performance results.

关键词：本体信息抽取 WEB页面关键技术抽取框架

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于本体的Web信息抽取系统被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于本体的Web信息抽取系统 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于本体的Web信息抽取系统被引量：14