基于Hadoop的网络爬虫技术研究被引量：4

Research on Web Crawler Technology Based on Hadoop

出　　处：《吉林工程技术师范学院学报》2014年第8期87-89,共3页Journal of Jilin Engineering Normal University

摘　　要：网络爬虫一般从一个起始网页开始,读取网页的内容和网页中的链接,依次循环下去,直到找到此网页所有的链接网页为止;当要爬取的数据量比较大时,传统的技术存在一定弊端,而Hadoop开源云计算框架在数据采集方面会有一定的优势。在介绍Hadoop云计算框架的基础上,本文阐述网络爬虫的原理,并实现基于Hadoop的网络爬虫。The Web crawler usually starts from a starting Webpage, reads the content of webpage and Webpage links, successively circles until it finds all the webpage links; when you want to climb from the large amount of data, the traditional technology has some disadvantages, and the Hadoop open source cloud computing framework will have a certain advantages in data acquisition. On the basis of the introduction of Hadoop cloud computing framework, this paper describes the principle of the web crawler and realization of the web crawler based on Hadoop.

关键词：HADOOP 网络爬虫 MAPREDUCE 搜索引擎

分类号：TP393.071[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Hadoop的网络爬虫技术研究被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Hadoop的网络爬虫技术研究 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Hadoop的网络爬虫技术研究被引量：4