检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张树涛 谭海波[1] 陈良锋[1] 吕波[1] ZHANG Shutao;TAN Haibo;CHEN Liangfeng;Lü Bo(Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230039,China;Graduate School,University of Science and Technology of China,Hefei 230039,China)
机构地区:[1]中国科学院合肥物质科学研究院,合肥230039 [2]中国科学技术大学研究生院,合肥230039
出 处:《计算机工程》2019年第11期62-67,共6页Computer Engineering
基 金:安徽省科技重大专项“基于大数据的中小微企业精准智力服务平台”(711245801052)
摘 要:传统分布式爬虫系统负载均衡方法仅考虑少量的负载影响因素,未对各爬虫节点负载情况进行全面有效的评估,使得任务量的分配不合理。针对该问题,提出一种面向分布式爬虫系统的高效负载均衡策略。分析影响爬虫节点运行时间的因素,采用BP神经网络构建基于多影响因素的非线性分布式爬虫节点运行时间模型。以该模型预测的各子节点运行时间的最小方差为负载均衡策略的目标函数,并利用带约束条件的改进粒子群优化算法求解目标函数,确定负载均衡的任务分配方案。实验结果表明,该负载均衡策略在满足爬虫节点高性能要求的前提下,能有效缩短分布式爬虫系统的运行时间。Traditional load balance methods for distributed crawlers fail in providing comprehensively efficient evaluation of crawler node loads,as they consider only a small number of affecting factors in load.Thus the tasks are not reasonably assigned.To address the problem,this paper proposes an efficient load balance strategy for distributed crawlers.The strategy analyzes affecting factors in the running time of crawler nodes,and uses BP neural network to construct a non-linear running time model based on multiple affecting factors for distributed crawler nodes.The model predicts the running time of each sub-node,and the minimum variance of the running time is taken as the target function of load balance strategies.The target function is resolved by using improved particle swarm optimization algorithm with constraints to form a task assignment scheme with balanced loads.Experimental results show that the load balance strategy can efficiently reduce the running time of distributed crawlers while meeting the high performance requirements of crawler nodes.
关 键 词:分布式爬虫 负载均衡 预测模型 粒子群优化算法 约束条件
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.62