基于MapReduce计算模型的气象资料处理调优试验  被引量:8

A Set of MapReduce Tuning Experiments Based on Meteorological Operations

在线阅读下载全文

作  者:杨润芝[1] 沈文海[1] 肖卫青[1] 胡开喜[1] 杨昕[1] 王颖[1] 田伟[2] 

机构地区:[1]国家气象信息中心,北京100081 [2]南京信息工程大学,南京210044

出  处:《应用气象学报》2014年第5期618-628,共11页Journal of Applied Meteorological Science

摘  要:云计算技术使用分布式的计算技术实现了并行计算的计算能力和计算效率,解决了单机服务器计算能力低的问题。基于长序列历史资料所计算得出的气候标准值对于气象领域实时业务、准实时业务及科学研究中均具有重要的意义。由于长序列历史资料数据量大、运算逻辑较复杂,在传统单节点计算平台上进行整编计算耗时非常长。该文基于Hadoop分布式计算框架搭建了集群模式的云计算平台,以长序列历史资料作为源数据,基于MapReduce计算模型实现了部分整编算法,提高计算时效。同时,由于数据源本身具有文件个数多、单个文件小等特点,对数据源存储形式及数据文件大小进行改造,分别利用SequenceFile方式及文本文件合并方式对同一种场景进行计算时效对比测试,分别测试了10个文件合并、100个文件合并两种情况,使时效性得到了更大程度的提升。Cloud computing technologies,which solves the problem of low computing power of a standalone server,uses distributed computing technology to achieve the computing power of parallel computing and computational efficiency.Cloud computing is a new application model for decentralized computing which can provide reliable,customized and maximum number of users with minimum resource,and it is also an important way to carry out cloud computing theory research and practical application combining with other theory and good techniques.In many industries and fields,cloud computing has a wider range of applications,and its flexibility,ease of use,stability is gradually affirmed.In meteorological department,cloudbased platform for the development of scientific computing is still very limited,but some attempts are implemented with the maturation of cloud computing.In meteorological operations,such as large-scale scientific computing and other general computing model are run on high-performance server clusters.Due to limitations of resources and the number of HPC nodes,scientific computing still relies on traditional standalone or clustered mode.Therefore,an internal exploration and conventional general-purpose computing and cloud computing platform is very meaningful for the meteorological department.60-year valuable and precious long sequence of historical data are stored in National Meteorological Information Center for the use of real-time,near-real-time business and research.Processing these historical data is time-consuming,therefore some new methods are implemented.Based on Hadoop cloud computing platform,a cluster mode is built and a variety of statistical methods are adopted using MapReduce computation model.The storage format of the source data is adjusted with SequenceFile which is composed of 〈Key,Value〉 serialization,by this mean multiple files of Format-A are merged to a large SequenceFile to test computational efficiency changes.Meanwhile,many small files are merged to a larger file.Configurations are modified

关 键 词:MAPREDUCE 云计算 HADOOP 历史资料整编 

分 类 号:P468[天文地球—大气科学及气象学] P409

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象