Survey of Distributed Computing Frameworks for Supporting Big Data Analysis  被引量:3

在线阅读下载全文

作  者:Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang 

机构地区:[1]College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060,China [2]Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ),Shenzhen 518107,China

出  处:《Big Data Mining and Analytics》2023年第2期154-169,共16页大数据挖掘与分析(英文)

基  金:supported by the National Natural Science Foundation of China(No.61972261);Basic Research Foundations of Shenzhen(Nos.JCYJ 20210324093609026 and JCYJ20200813091134001).

摘  要:Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.

关 键 词:distributed computing frameworks big data analysis approximate computing MapReduce computing model 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象