基于本体和DOM相结合的Web信息抽取器  被引量:5

A Web Information Extractor Based on the Combination of Ontology and DOM

在线阅读下载全文

作  者:柳佳刚[1] 陈山[1] 贺令亚[1] 

机构地区:[1]湖南工学院计算机科学系,衡阳421002

出  处:《现代图书情报技术》2009年第5期44-49,共6页New Technology of Library and Information Service

摘  要:针对基于Web页面信息本体的信息抽取不能准确划定抽取区域的缺点,设计基于本体和DOM相结合的Web信息抽取器。利用DOM树设计对样本页面信息项路径进行归纳学习的算法,该算法能准确划定信息抽取区域,降低页面噪声,实现对Web页面的预处理。实验表明,改进后的抽取方法提高了Web信息的抽准率。In terms of the weakness that information extraction based on information item Ontology of Web page can not partition accurately the areas of extraction, an improved Web information extractor based on Ontology and DOM is designed. This paper utilizes the DOM tree to design an inductive learning algorithm for the path of information items in sample Web pages. Through this algorithm, the areas of information extraction can be partitioned accurately, the noises of sample Web page can be reduced, and the preprocessing of the Web page can be implemented. The experiment shows that the improved approach can increase the precision of information extraction.

关 键 词:信息抽取 包装器 本体 文档对象模型 归纳学习 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象