浅谈Spark性能优化方法  

Analysis of Spark Performance Optimization Method

在线阅读下载全文

作  者:韦统边 吴江波 苏德 张亮 韦通明 WEI Tongbian;WU Jiangbo;SU De;ZHANG Liang;WEI Tongming(Guangxi Key Laboratory of Automobile Four New Features,SAIC GM Wuling Automoblie Co.,Ltd.,Liuzhou Guangxi 545007,China)

机构地区:[1]上汽通用五菱汽车股份有限公司广西汽车新四化重点实验室,广西柳州545007

出  处:《信息与电脑》2022年第2期53-55,共3页Information & Computer

摘  要:随着物联网的快速发展和科技的进步,社会各行业的数据量正以前所未有的速度和规模在增长,如何在海量数据中快速获得有价值的数据也成为企业关注的重点。Spark作为目前最流行的开源大数据处理框架,受底层机制复杂和集群资源的限制,常出现内存不足、任务执行时间长等问题。为此,本文从开发原则、分区和读取数据的格式、集群并行度以及结构化API这4个方面对Spark应用程序性能进行分析和总结,以优化资源配置、提高开发效率。With the rapid development of the Internet of Things and the advancement of science and technology, the amount of data in various industries in society is growing at an unprecedented speed and scale. How to quickly obtain valuable data from the massive data has become the focus of enterprises. Spark, as the most popular open source big data processing framework, is limited by the complexity of the underlying mechanism and cluster resources, and often has problems such as insufficient memory and long task execution time. To this end, this paper analyzes and summarizes the performance of Spark applications from four aspects: development principles, partition and read data formats, cluster parallelism, and structured API, in order to optimize resource allocation and improve development efficiency.

关 键 词:物联网 价值 计算 SPARK 并行度 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象