检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄志 苏传程 苏晓红 HUANG Zhi;SU Chuancheng;SU Xiaohong(Guangxi Meteorological Information Center,Nanning 530022)
机构地区:[1]广西壮族自治区气象信息中心,南宁530022
出 处:《气象科技》2022年第1期51-58,共8页Meteorological Science and Technology
基 金:2021年广西气象科研计划指令性项目(桂气科2021ZL02)资助。
摘 要:针对长时间序列、多站点和多气象要素的大数据量查询需求,现有的CIMISS(China Integrated Meteorological Information Sharing System)存在支撑能力严重不足的问题。本研究使用广西气象站点建站至今的历史地面气象记录月报表数据资料和现有Hadoop集群物理资源,重新设计数据ETL流程,构建Parquet格式数据集并完成HDFS转换存储;嵌入Spark的Broadcast广播变量,优化Spark集群执行参数,提高了集群的处理并行度和SparkSql的关联查询效率。结果表明,Parquet格式数据集的最高压缩比超过95%,一次性大数据量的查询效率比原来提升了1~5倍,并支持高并发访问,为各类相关预报预测业务的开展提供了有效的技术支撑。Aiming at a large amount of data query requirements of long-time series,multi-sites and multi-meteorological elements,the supporting capacity of the existing CMISS(China Integrated Meteorological Information Sharing System)is seriously insufficient.In this study,the monthly report data of historical surface meteorological records since the establishment of the meteorological stations in Guangxi and existing Hadoop cluster physical resources are used to redesign the ETL process,construct the Parquet format dataset,and complete HDFS conversion storage.Besides,the Broadcast variable of Spark is embedded to optimize the execution parameters of the Spark cluster,which improves the processing parallelism of the cluster and the association query efficiency of SparkSql.The results show that the maximum compression ratio of the Parquet format data set was more than 95%;the query efficiency of the one-time large amount of data was 1 to 5 times higher than the original and supported high concurrent access,providing effective technical support for the development of various related forecasting services.
关 键 词:HADOOP SPARK ETL PARQUET 列式存储 BROADCAST
分 类 号:P409[天文地球—大气科学及气象学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7