基于灰狼算法的主题爬虫  被引量:8

Focused Crawling Based on Grey Wolf Algorithms

在线阅读下载全文

作  者:萧婧婕 陈志云 XIAO Jing-jie;CHEN Zhi-yun(Department of Computer Science and Technology,East China Normal University,Shanghai 200062,China)

机构地区:[1]华东师范大学计算机科学技术系,上海200062

出  处:《计算机科学》2018年第B11期146-148,166,共4页Computer Science

基  金:基于MOOC的计算机课资源建设项目资助

摘  要:为了解决主题爬虫在全局搜索中难以实现最优解的问题,提高主题爬虫的准确率和召回率,文中设计了一个结合灰狼算法的主题爬虫搜索策略。实验结果表明,与传统的广度优先搜索策略以及同样是群体智能算法的遗传算法相比,基于灰狼算法的主题爬虫的性能有了很大的提高,能爬取到更多的主题相关的网页。In order to solve the problem that the focused crawler is difficult to achieve an optimal solution in the global search,and improve the accuracy of the topic crawler and the recall rate,this paper designed a focused crawler search strategy combined with grey wolf algorithm.The experimental results show that compared with the traditional breadth-first search strategy and the genetic algorithm which is also a swarm intelligence algorithm,the performance of the focused crawler based on grey wolf algorithm was greatly improved,and more topic-related web pages can be crawled.

关 键 词:主题爬虫 灰狼算法 主题相关度 网页重要性 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象