基于元搜索引擎的个性化Web信息采集被引量：12

Customized web crawling based on meta search engine

机构地区：[1]武汉工程大学计算机科学与工程学院,湖北武汉430073 [2]支付宝(中国)网络技术有限公司,浙江杭州310099

出　　处：《计算机工程与设计》2009年第13期3117-3119,共3页Computer Engineering and Design

摘　　要：为了减少传统Web采集系统网络资源的耗费,并增强其个性化支持,结合用户兴趣向量模型,将元搜索引擎技术应用到Web信息采集领域中,设计一个基于元搜索引擎的个性化Web信息采集系统。该系统通过调用成员搜索引擎发现与用户兴趣相关的目标Web站点,通过爬虫程序采集目标站点上的Web页面内容。在发现兴趣站点方面更具有针对性,能有效减少爬虫的数量。重点研究了系统的体系结构、个性化Web采集的工作流程,最后给出了该系统的应用场合。To reduce the cost of network resource of traditional web crawling system and enhance its ability of customized supporting, a customized web information crawling system based on meta search engine is designed. This system combines the user interest vector model and applies the meta search engine technique to web crawling. The destination web is found which is correlated to user＇ s interest through calling the member search engine. And the contents of page in destination web are crawled by the crawler program. When it comes to finding interest web, this system is more powerful, it could reduce the quantity of crawler effectively. System architecture, and customized web crawling workflow are mainly introduced. The system＇ s application situation is proposed at last.

关键词：元搜索引擎个性化 WEB信息采集兴趣向量体系结构

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于元搜索引擎的个性化Web信息采集被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于元搜索引擎的个性化Web信息采集 被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于元搜索引擎的个性化Web信息采集被引量：12