检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Jialun WANG Wenhao PANG Chuliang WENG Aoying ZHOU
机构地区:[1]School of Data Science and Engineering,East China Normal University,Shanghai 200062,China
出 处:《Frontiers of Computer Science》2023年第4期141-153,共13页中国计算机科学前沿(英文版)
基 金:supported by the National Natural Science Foundation of China(Grant Nos.61732014 and 62141214);the National Key Research and Development Programof China(2018YFB1003400).
摘 要:In analytical queries,a number of important operators like JOIN and GROUP BY are suitable for parallelization,and GPU is an ideal accelerator considering its power of parallel computing.However,when data size increases to hundreds of gigabytes,one GPU card becomes insufficient due to the small capacity of global memory and the slow data transfer between host and device.A straightforward solution is to equip more GPUs linked with high-bandwidth connectors,but the cost will be highly increased.We utilize unified memory(UM)produced by NVIDIA CUDA(Compute Unified Device Architecture)to make it possible to accelerate large-scale queries on just one GPU,but we notice that the transfer performance between host and UM,which happens before kernel execution,is often significantly slower than the theoretical bandwidth.An important reason is that,in singleGPU environment,data processing systems usually invoke only one or a static number of threads for data copy,leading to an inefficient transfer which slows down the overall performance heavily.In this paper,we present D-Cubicle,a runtime module to accelerate data transfer between host-managed memory and unified memory.D-Cubicle boosts the actual transfer speed dynamically through a self-adaptive approach.In our experiments,taking data transfer into account,D-Cubicle processes 200 GB of data on a single GPU with 32 GB of global memory,achieving 1.43x averagely and 2.09x maximally the performance of the baseline system.
关 键 词:data analytics GPU unified memory
分 类 号:TP333[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49