基于网页内容块策略的主题爬行被引量：2

Block-based topic crawling

出　　处：《计算机工程与应用》2008年第9期143-146,共4页Computer Engineering and Applications

摘　　要：因特网的迅速发展对传统的爬行器和搜索引擎提出了巨大的挑战。各种针对特定领域、特定人群的搜索引擎应运而生。Web主题信息搜索系统(网络蜘蛛)是主题搜索引擎的最主要的部分,它的任务是将搜集到的符合要求的Web页面返回给用户或保存在索引库中。Web上的信息资源如此广泛,如何全面而高效地搜集到感兴趣的内容是网络蜘蛛的研究重点。提出了基于网页分块技术的主题爬行,实验结果表明,相对于其它的爬行算法,提出的算法具有较高的效率、爬准率、爬全率及穿越隧道的能力。With the explosive growth of the World-Wide Web,to general-purpose crawlers and search engines which pose great challenges.All sorts of special topic search engines are designed for special people and special domains.The web topic information search system （web spider） is the most important part of topic search engine,it collects web pages of special topic and provides users with the result or stores it in index database.Information resource of web is so extensive,how to collect interest content comprehensively and effectively,it is important to web spider research.In this paper,a new crawling strategy block-based topic crawling has been proposed,the experiments show that compared with some traditional algorithms,this algorithm has better performance.It is effective and has high precision.

关键词：定题搜索主题爬行搜索引擎爬行算法相关度分析

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网页内容块策略的主题爬行被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网页内容块策略的主题爬行 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于网页内容块策略的主题爬行被引量：2