基于本体的网络爬虫技术研究被引量：7

A Study of Ontology-based Web Crawler

机构地区：[1]湖州师范学院信息工程学院,湖州313000 [2]宁波大学网络中心,宁波315211

出　　处：《情报学报》2007年第5期723-727,共5页Journal of the China Society for Scientific and Technical Information

基　　金：国家自然科学基金资助项目（60573056）,浙江省自然科学基金重点资助项目（Z106335）,浙江省自然科学基金（Y105625）.

摘　　要：互联网已经成为最大的非结构化数据库,极大方便了信息访问.然而,网络上的信息大多都是无组织的,由于网络的分布式特性,很难对它进行信息和知识管理.因此,如何建立一个智能的信息发现机制很有必要.本文在分析了爬虫工作原理和传统算法后,提出了一种基于本体的网络爬虫的信息发现框架.该框架包含了预处理模块和本体管理模块,定义了网页相关度计算策略,最后通过实验对该框架进行了评估.The Web, the largest unstructured database of the world, has greatly improved access to information. However, information on the Web is largely disorganized. Due to the distributed nature of the World Wide Web it is difficult to use it as a tool for information and knowledge management. Therefore, user doing the difficult task of exploring the Web has to be supported by intelligent means. This paper proposes an approach for information discovery building on a comprehensive framework for ontology-based web crawler. Our framework includes preproeessing module and ontology management module. It defines a relevance computation strategies of the web page and provides an empirical evaluation which has shown premising results.

关键词：本体网络爬虫语义网信息检索

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于本体的网络爬虫技术研究被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于本体的网络爬虫技术研究 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于本体的网络爬虫技术研究被引量：7