一种基于动态拓扑的流计算性能优化方法及其在Storm中的实现  被引量:7

A Performance Optimization Method Based on Dynamic Topology for Stream Computing and Its Implementation in Storm

在线阅读下载全文

作  者:陆佳炜[1] 吴涵 陈烘 张元鸣[1] 梁倩卉 肖刚[1] LU Jia-wei;WU Han;CHEN Hong;ZHANG Yuan-ming;LIANG Qian-hui;XIAO Gang(Department of Computer Science and Technology,Zhejiang University of Technology,Hangzhou,Zhejiang 310023,China;Team of Big Data Computing and Service,Department of Infrastructure Business,Alibaba,Hangzhou,Zhejiang 310011,China;School of Computer Science and Engineering,Nanyang Technological University,Singapore 637457,Singapore)

机构地区:[1]浙江工业大学计算机科学与技术学院,浙江杭州310023 [2]阿里巴巴基础架构事业部大数据计算与服务团队,浙江杭州310011 [3]南洋理工大学计算机科学与工程学院,新加坡637457

出  处:《电子学报》2020年第5期878-890,共13页Acta Electronica Sinica

基  金:国家自然科学基金(No.61976193);浙江省自然科学基金(No.LY19F020034);浙江省重点研发计划项目(No.2018C01064)。

摘  要:响应性和稳定性一直是流式计算中两个至关重要的问题,而流计算系统在过载时常常表现出数据计算延迟增加和拓扑不稳定的现象,无法适应数据负载的动态变化.针对这一问题本文研究提出了一种基于动态拓扑的流计算性能优化方法,主要包括:(1)动态逐级反压:拓扑中的任务可以根据当前自身负载情况,动态调整上游向其发送数据的速率.(2)无状态拓扑数据重放:拓扑不维持数据的计算状态,尽可能地实现数据容错.(3)自适应拓扑替换:在拓扑不暂停的情况下对任务并发度进行自发调整.(4)延迟持久化队列:拓扑中对磁盘的IO读写被延迟到数据处理之外,减缓IO高频阻塞对流计算系统的影响.本文在Apache Storm中实现了以上四种方案,性能测试结果表明优化后的流计算系统与Storm默认实现相比,不仅增强了大数据动态匹配能力,而且在最优情况下改善了17%的吞吐量,并提升了约20%的数据处理速度.Responsiveness and stability have always been two important problems in stream computing.However,as the scale of data being processed in real-time has increased,along with an increase in the data processing latency and topology instability of stream computing,many limitations of stream processing system have become apparent.Aiming at these problems,we present a performance optimization method based on dynamic topology for stream computing:(1)Dynamic step-by-step backpressure:the task in the topology can dynamically adjust the rate of upstream data transmission according to the current load.(2)Stateless topology data replay:topology can achieve data fault tolerance autonomously without maintaining the calculation of data state.(3)Adaptive topology replacement:no need for topology to suspend,the system can adjust the task concurrency spontaneously.(4)Delayed persistent queue:it delays the IO reading and writing in the disk out of the data processing,which mitigates the impact of IO high-frequency blocking in stream computing system.In this paper,the four methods are implemented in Apache Storm.The experimental results show that the optimized system not only enhances the dynamic matching capability of big data,but also achieves 17%higher throughput and 20%better data processing speed in the best case.

关 键 词:数据流拓扑 流计算 大数据 流计算系统 性能优化 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象