基于网络爬虫的搜索引擎的设计与实现被引量：2

Design and Implementation of Search Engine Based on Web Crawler

作　　者：高文超[1] 李浩源徐永康 GAO Wen-chao;LI Hao-yuan;XU Yong-kang(School of Mechanical Electronic and Information Engineering,China University of Mining and Technology-Beijing,Beijing 100083,China)

机构地区：[1]中国矿业大学(北京)机电与信息工程学院,北京100083

出　　处：《电脑知识与技术》2020年第30期6-9,12,共5页Computer Knowledge and Technology

基　　金：中央高校基本科研业务费专项资金(项目编号:2020YQJD15);中国矿业大学(北京)本科教育教学改革与研究项目(项目编号:J200513);国家大学生创新训练项目(项目编号:C202004828)。

摘　　要：随着信息量的增多,为用户提供便捷的搜索服务也更加具有挑战性。大规模存储信息并精确搜索的代价是巨大的,人们需要在信息搜索的快捷性与成本中找到平衡。系统实现一个基于网络爬虫的搜索引擎。软件结构分为爬虫部分,数据库部分,前端显示部分。同时,描述了扩展成分布式爬虫的方法。硬件方面需要多台主机,软件方面包括Scrapy爬虫、数据库、Django框架。最终设计并实现了一个具有良好的健壮性和扩展性的网络爬虫系统。In the Internet era,with an increasing amount of information,it is more challenging to provide users with convenient search services.The cost of storing information on a large scale and searching accurately is huge,and people need to balance the speed and cost of information searching.This system implements a search engine based on a web crawler.The software structures are divided into the crawler part,database part,and front-end display part.At the same time,it describes the method to expand in⁃to a distributed crawler.In terms of hardware,multiple hosts are needed.In terms of software,Scrapy crawlers,databases,and the Django framework.Finally,a web crawler system with good robustness and expansibility is designed and implemented.

关键词：爬虫信息搜索引擎数据库 WEB框架

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网络爬虫的搜索引擎的设计与实现被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网络爬虫的搜索引擎的设计与实现 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于网络爬虫的搜索引擎的设计与实现被引量：2