Spark-GPU框架下海洋地理空间数据分布式并行处理任务调度  被引量:4

Distributed Parallel Task Scheduling on Spark-GPU Framework for Oceanographic Geospatial Data Processing

在线阅读下载全文

作  者:景辉 秦勃[1] 姜晓轶[2] 夏海涛 JING Hui;QIN Bo;JIANG Xiao-Yi;XIA Hai-Tao(Computer Science and Technology Department,Ocean University of China,Qingdao 266100,China;The Key Laboratory of Digital Oceanic Science and Technology,National Marine Data and Information Service, Tianjin 300171,China)

机构地区:[1]中国海洋大学信息科学与工程学院,山东青岛266100 [2]国家海洋局数字海洋科学技术重点实验室,天津300171

出  处:《中国海洋大学学报(自然科学版)》2018年第A02期180-186,共7页Periodical of Ocean University of China

基  金:海洋环境信息云计算与云服务体系框架应用研究项目(931146140)资助~~

摘  要:大规模长时间序列海洋地理空间数据处理属于计算密集型任务。本文重点介绍Spark框架下如何利用GPU并行计算机制实现海洋地理空间数据分布式并行处理的任务调度,以提高大规模长时间序列海洋地理空间数据处理效率,满足实时交互需求。Spark-GPU框架包括Spark-GPU调度器和Spark-GPU运行时两部分。任务计算量和GPU设备计算能力作为调度策略因子,采用一个多项式时间的2近似算法求解,是一个著名的无关并行机任务调度问题。本文以流场可视化线积分卷积算法作为测试用例,1 000~2 000场的任务调度测试结果表明与原生Spark调度算法相比,Spark-GPU框架执行时间减少了14%~18%,GPU占用比提高了10%~20%。Long time and large scale Oceanographic Geospatial Data(OGD)processing is computation-intensive.This paper focuses on the method of task scheduling for ODG distributed parallel processing based on Spark with GPU,to imporve processing efficiency of long time and large scale OGD,and satisfy real-time interaction requirements.Spark original scheduling algorithms(FIFO,FAIR)shows severe problem,low efficiency and more execution time when running computation-intensive tasks,To solve the problem,this paper presents a Spark-GPU Framework(SGF).SGF includes Spark-GPU Scheduler(SGS)and Spark-GPU Runtime(SGR).SGS takes into consideration of GPU tasks with different computation and GPU devices with different computing capacity.The scheduling is on Unrelated Parallel Machines and deal with a polynomial 2-Approximation Algorithm.SGR uses JNI+CUDA as GPU task runtime.The method of JNI+CUDA use only one JNI call to achieve high efficiency,and is easy to programming and debug.The main contribution of this paper is as follow:(1)Improved Spark-GPU Framework can support more balance scheduling of large scale computation task running,(2)Describe a scheduling algorithm for large scale computation task on heterogeneous GPU devices by consider GPU tasks with different computation and GPU devices with different computing capacity.Flow Field Visualization is as the test application.On a cluster with 10 GPU nodes,1 000~2 000 field tasks evaluation show the SGF can reduce 14%~18%execution time,improve GPU time occupancy ratio 10%~20%.

关 键 词:SPARK 云计算 分布式并行 GPU 任务调度 无关并行机任务调度 

分 类 号:TP338.8[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象