大数据流式计算:关键技术及系统实例  被引量:316

Big Data Stream Computing: Technologies and Instances

在线阅读下载全文

作  者:孙大为[1] 张广艳[1,2] 郑纬民[1] 

机构地区:[1]清华大学计算机科学与技术系,北京100084 [2]符号计算与知识工程教育部重点实验室(吉林大学),吉林长春130012

出  处:《软件学报》2014年第4期839-862,共24页Journal of Software

基  金:国家自然科学基金(61170008;61272055);国家重点基础研究发展计划(973)(2014CB340402);吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172012K12)

摘  要:大数据计算主要有批量计算和流式计算两种形态,目前,关于大数据批量计算系统的研究和讨论相对充分,而如何构建低延迟、高吞吐且持续可靠运行的大数据流式计算系统是当前亟待解决的问题且研究成果和实践经验相对较少.总结了典型应用领域中流式大数据所呈现出的实时性、易失性、突发性、无序性、无限性等特征,给出了理想的大数据流式计算系统在系统结构、数据传输、应用接口、高可用技术等方面应该具有的关键技术特征,论述并对比了已有的大数据流式计算系统的典型实例,最后阐述了大数据流式计算系统在可伸缩性、系统容错、状态一致性、负载均衡、数据吞吐量等方面所面临的技术挑战.Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.

关 键 词:大数据计算 流式计算 流式大数据 内存计算 系统实例 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象