基于资源感知的分布式爬虫任务调度方法  

Method of distributed crawler task scheduling based on resource awareness

在线阅读下载全文

作  者:张军[1] 魏继桢 李钰彬 ZHANG Jun;WEI Jizhen;LI Yubin(School of Information Engineering,East China University of Technology,Nanchang 330013,China)

机构地区:[1]东华理工大学信息工程学院,江西南昌330013

出  处:《现代电子技术》2024年第9期86-90,共5页Modern Electronics Technique

基  金:国家自然科学基金资助项目(62162002);国家自然科学基金资助项目(61662002);江西省自然科学基金资助项目(20212BAB202002)。

摘  要:文中致力于开发一种基于资源感知的分布式爬虫任务调度方法,以优化分布式环境中各节点的系统资源利用,提升爬虫任务的执行效率。该方法通过引入资源感知调度算法和节点优先级管理,实现对节点中CPU、内存、网络等资源的监测,以便均衡调度爬虫任务,即确保爬虫任务在资源利用率较低的节点上执行,从而有效减轻各个节点之间资源过度占用和不均衡问题。另外,该方法引入的Flask提高了可扩展性,实现了可视化爬虫监控平台。实验结果表明,文中提出的方法在提高爬虫任务执行效率和适应性方面取得了显著效果,为分布式爬虫系统的进一步优化提供了有益指导。This paper aims to develop a distributed crawler task scheduling method based on resource awareness,so as to optimize the system resource utilization of each node in a distributed environment and improve the execution efficiency of crawler task.By introducing resource awareness scheduling algorithm and node priority management,the monitoring of resources of CPU,memory and network in nodes is achieved to balance the scheduling of crawler task,that is,to ensure that crawler tasks are executed on nodes with low resource utilization,so as to effectively relieve the excessive resource occupation and imbalance among nodes.In addition,the introduction of Flask has improved the scalability of the method and achieved a visual crawler monitoring platform.Experimental results show that the proposed method can achieve significant results in improving the efficiency and adaptability of crawler task execution,which provides useful guidance for the further optimization of distributed crawler systems.

关 键 词:分布式爬虫 任务调度 资源感知 FLASK 数据采集 资源利用率 

分 类 号:TN919-34[电子电信—通信与信息系统] TP303[电子电信—信息与通信工程] TP333[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象