多线程网络爬虫的设计与实现被引量：3

Design and Implementation of Multi-threads Web Crawler

出　　处：《电脑开发与应用》2012年第6期65-67,70,共4页Computer Development & Applications

摘　　要：针对互联网信息急剧增多,为了改善网络爬虫的爬行性能和提高爬虫程序的通用性,分析了网络爬虫的原理与架构,设计实现了一种高速多线程网络爬虫程序。该爬虫程序采用多个线程并行处理网页,采用宽度优先和深度优先结合的方式来控制网页爬取深度。实验证明该爬虫程序减少了网页下载过程中的平均等待时间,具有较好的性能。For the increasingly information of the Internet, a new High-speed and Multi-threads Web Crawler （HMWC） was designed and implemented to improve the performance and enhance the versatility in thispaper. The web crawler dealt with the web page by using multi-threads and controlled the depth of crawling by mixed method of breadth-first and depth-first. Experimental result show that as a web crawler, HMWC canreduce the average waiting time of web page downloaded and has better performance.

关键词：搜索引擎网络爬虫多线程 URL队列宽度优先

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多线程网络爬虫的设计与实现被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多线程网络爬虫的设计与实现 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

多线程网络爬虫的设计与实现被引量：3