检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:鲁亮[1] 于炯[1,2] 卞琛[2] 英昌甜[2,3] 师康利 蒲勇霖
机构地区:[1]新疆大学信息科学与工程学院,乌鲁木齐830046 [2]新疆大学软件学院,乌鲁木齐830008 [3]新疆大学电气工程学科博士后科研流动站,乌鲁木齐830047
出 处:《计算机应用》2018年第3期699-706,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61462079;61562086);新疆维吾尔自治区自然科学基金资助项目(2017D01A20);新疆维吾尔自治区高校科研计划项目(XJEDU2016S106);新疆维吾尔自治区研究生科研创新项目(XJGRI2016028)~~
摘 要:大数据流式计算平台Apache Storm默认采用轮询的方式进行任务调度,未考虑到拓扑中各任务计算开销的差异以及任务之间不同类型的通信模式,在负载均衡和通信开销方面存在较大的优化空间。针对这一问题,提出一种Storm环境下基于权重的任务调度算法(TSAW-Storm)。该算法首先根据各任务的CPU资源占用情况以及任务间的数据流大小,分别确定拓扑的点权和边权;并利用最大化边权增益的思想,逐步构建起各工作节点中承载的任务集合,在保证集群负载均衡的同时,尽可能将边权较大的节点间数据流转化为节点内数据流,从而降低网络传输开销。实验结果表明,在包含有8个工作节点的WordCount基准测试中,TSAW-Storm的系统延迟和节点间数据流大小相比Storm默认调度算法分别降低了30.0%和32.9%,且各工作节点的CPU负载标准差仅为Storm默认调度算法的25.8%;此外,在与在线调度算法的对比实验中,TSAW-Storm在系统延迟、节点间数据流大小和CPU负载标准差方面分别降低了7.76%、11.8%和5.93%,且算法的执行开销明显降低,有效提高了Storm系统的运行效率。Apache Storm, a typical platform for big data stream computing, uses a round-robin scheduling algorithm as the default scheduler, which does not consider the fact that differences of computational and communication cost are ubiquitous among different tasks and different data sfreams in one topology. Hence optimization is needed in terms of load balance and communication cost. To solve this problem, a Task Scheduling Algorithm based on Weight in Storm (TSAW-Storm) was proposed. In the algorithm, CPU occupation was taken as the weight of a task in a specific topology, and similarly tuple rate between a pair of tasks was taken as the weight of a data stream. Then tasks were assigned to the most suitable work node gradually by maximizing the gain of weight of data streams via transforming inter-node data streams into intra-node ones as many as possible with load balance ensured in order to reduce network overhead. Experimental results show that TSAW-Storm can reduce latency and inter-node tuple rate by about 30.0% and 32.9% respectively, and standard deviation of CPU load of work nodes is only 25.8% when compared to Storm default scheduling algorithm in WordCount benchmark with 8 work nodes. Additionally, online scheduler is deployed in contrast experiment. Experimental results show that TSAW-Storm can reduce latency, inter-node tuple rate and standard deviation of CPU load by about 7.76%, 11.8% and 5.93% respectively, which needs only a bit of executive overhead compared to online scheduler. Therefore, the proposed algorithm can reduce communication cost as well as improve load balance effectively, which makes a great contribution to the efficient operation of Apache Storm.
关 键 词:大数据 流式计算 STORM 权重 任务调度 负栽均衡 通信开销
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222