PageRank算法在主题网络爬虫中的应用被引量：1

Application of PageRank Algorithm in Topic Web Crawler

作　　者：于林轩李业丽[1] 曾庆涛[1] YU Linxuan;LI Yeli;ZENG Qingtao(Integrated Laboratory for Applied Research and Services of Key Technologies in Press and Publication Field,Beijing Institute of Graphic Communication,Beijing 102600,China)

机构地区：[1]北京印刷学院新闻出版领域关键技术应用研究与服务综合实验室,北京102600

出　　处：《北京印刷学院学报》2020年第10期143-147,共5页Journal of Beijing Institute of Graphic Communication

基　　金：北京科技创新服务能力建设项目(PXM2016_014223_000025);广东省科技重大专项项目(190826175545233)。

摘　　要：随着网络信息技术的不断发展,网络上充斥着大量的各类被称为大数据的非结构化数据。然而,这些数据不容易被存储到本地数据库中进行访问和处理。人们渐渐地意识到,高效率地从各式各样、含有大量干扰的网络上获得最新有用的信息至关重要。靠人力搜集信息劳神费力,因此网络爬虫技术应运而生。但是现有的搜索引擎在主题相似性判断和网页排序算法中还是存在不足。因此,本文将PageRank算法应用于主题爬虫,构建了一个垂直搜索引擎。With the continuous development of network information technology,the network is full of a large number of unstructured data known as big data.However,these data are not easily stored in a local database for access and processing.Increasingly,people are realizing the importance of efficiently accessing the latest and most useful information from a wide variety of networks that involve a lot of interference.The effort to gather information by human hands has led to the emergence of web crawler technology.However,the existing search engines still have shortcomings in topic similarity judgment and page sorting algorithm.Therefore,this paper applies PageRank algorithm to topic crawler and constructs a vertical search engine.

关键词：爬虫 PAGERANK 主题

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PageRank算法在主题网络爬虫中的应用被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

PageRank算法在主题网络爬虫中的应用 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

PageRank算法在主题网络爬虫中的应用被引量：1