检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韦统边 吴江波 苏德 张亮 韦通明 WEI Tongbian;WU Jiangbo;SU De;ZHANG Liang;WEI Tongming(Guangxi Key Laboratory of Automobile Four New Features,SAIC GM Wuling Automoblie Co.,Ltd.,Liuzhou Guangxi 545007,China)
机构地区:[1]上汽通用五菱汽车股份有限公司广西汽车新四化重点实验室,广西柳州545007
出 处:《信息与电脑》2022年第2期53-55,共3页Information & Computer
摘 要:随着物联网的快速发展和科技的进步,社会各行业的数据量正以前所未有的速度和规模在增长,如何在海量数据中快速获得有价值的数据也成为企业关注的重点。Spark作为目前最流行的开源大数据处理框架,受底层机制复杂和集群资源的限制,常出现内存不足、任务执行时间长等问题。为此,本文从开发原则、分区和读取数据的格式、集群并行度以及结构化API这4个方面对Spark应用程序性能进行分析和总结,以优化资源配置、提高开发效率。With the rapid development of the Internet of Things and the advancement of science and technology, the amount of data in various industries in society is growing at an unprecedented speed and scale. How to quickly obtain valuable data from the massive data has become the focus of enterprises. Spark, as the most popular open source big data processing framework, is limited by the complexity of the underlying mechanism and cluster resources, and often has problems such as insufficient memory and long task execution time. To this end, this paper analyzes and summarizes the performance of Spark applications from four aspects: development principles, partition and read data formats, cluster parallelism, and structured API, in order to optimize resource allocation and improve development efficiency.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49