流式处理系统的动态数据分配技术  被引量:3

Dynamic data distribution for stream processing system

在线阅读下载全文

作  者:王成章[1] 林学练[1] 谭静芳[2] 

机构地区:[1]北京航空航天大学计算机学院,北京100191 [2]泰山学院物理与电子工程学院,山东泰安271021

出  处:《计算机工程与科学》2014年第10期1846-1853,共8页Computer Engineering & Science

基  金:国家973计划资助项目(2014CB340300)

摘  要:流式数据处理中,数据倾斜等原因易导致计算节点的负载不均衡,降低系统处理能力。传统的负载均衡方法,比如算子分配、算子迁移和负载脱落等技术因为相对较高的性能代价,在流式处理系统中没有得到广泛的应用。针对流式处理系统的特点,提出一种新的负载均衡方法。在该方法中,计算单元的数据被划分为若干分区,并且数据分区可以在计算单元中动态分配和迁移,在较少干扰系统运行的情况下,通过动态调整各计算单元的分区,平衡各个计算单元的输入流和利用率,以此达到负载平衡的目的。在此基础上,设计并实现了流式处理系统的负载均衡算法和数据在线迁移技术。实验结果表明,该方法能够显著减少数据处理的平均延迟,提高系统吞吐量。In stream processing systems, data skew often leads to load imbalance among computing nodes,thereby increases the response time of data process. Traditional load balancing methods such as operator distribution, operator migration and load shedding have never been widely applied in stream pro- cessing systems because of a relatively high performance penalty. Considering the characteristics of stream processing systems, a new load balancing mechanism is proposed. In this mechanism, the data on computing units are split into some sections,and each section can be allocated and migrated dynamically among computing units. Then, for the purpose of load balancing, the input streams and utilizations are balanced among computing units by adjusting sections with few disturbances on steam processing sys- tems. Based on this, we design and implement a load balancing algorithm as well as an online data migra- tion method. The experimental results show that our mechanism can reduce the average latency of data processing and improve the system throughput significantly.

关 键 词:数据流 流式处理 负载均衡 数据分配 数据迁移 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象