检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张帅[1] 李涛[1] 焦晓帆 王艺峰[1] 杨愚鲁[1]
机构地区:[1]南开大学计算机与控制工程学院计算机科学与信息安全系,天津300071
出 处:《计算机研究与发展》2015年第11期2555-2567,共13页Journal of Computer Research and Development
基 金:国家自然科学青年基金项目(61212005;61201424);天津市科技特派员项目(14JCTPJC00501);高等学校博士学科点专项科研基金项目(20130031120029)
摘 要:谱聚类是数据挖掘领域最常用的聚类算法之一,但对于如何利用多核CPU与资源有限的众核加速器设计并实现一个在异构单节点上能够处理大规模数据集的高效谱聚类算法,目前尚无理想的解决方案.PSCH(parallel spectral clustering for hybrids)算法是专为CPU-GPU异构计算环境设计的并行T近邻(T-nearest-neighbors,TNN)谱聚类算法,通过分块计算相似性矩阵打破了GPU设备内存的限制,所能处理的数据集规模仅受限于CPU主存的容量.PSCH算法中使用CUDA设计实现双缓冲轮转4段流水机制,通过重叠计算与传输在打破存储瓶颈的同时保证了高计算性能.PSCH算法采用隐式重启动Lanczos方法(implicitly restarted Lanczos method,IRIM)在异构硬件上计算稀疏特征矩阵的特征分解,减轻了特征分解步骤的计算瓶颈.PSCH算法在配有一块GTX 480GPU的单节点上能够对百万以上规模的数据集进行聚类,并对实验中的4个数据集取得了相对于使用16进程的MPI并行谱聚类PSC算法2.0~4.5倍的性能.Spectral clustering is one of the most popular clustering algorithms in the data mining field.However,this algorithm suffers from the storage and computational bottlenecks heavily when dealing with large-scale datasets.Current work focuses on improving the spectral clustering on both algorithm and implementation levels.But how to design an efficient spectral clustering algorithm,which can handle million scale datasets on a single node with multicore CPU and manycore accelerators,is still an unsolved problem.A parallel spectral clustering using T-nearest-neighbors(TNN)and its implementation for CPU-GPU heterogeneous computing environment,named parallel spectral clustering for hybrids(PSCH),is proposed in this paper.It breaks the GPU device memory limitation by partitioning the TNN similarity matrix into blocks,so the dataset scale only subjects to the size of the host memory.In PSCH,the 4-stage pipeline mechanism with dual rotating buffers is designed to compute the TNN similarity matrix using CUDA,which keeps all the CPU,GPU,and PCIe bus busy to achieve high performance gains while breaking the device memory limitation.The implicitly restarted Lanczos method(IRIM)on GPU is employed for the eigen-decomposition of the sparse TNN similarity matrix,alleviating the computational bottleneck of the eigensolver.The results show that PSCH is highly-efficient at exploring the GPU memory bandwidth and hybrid CPU-GPU computation power.PSCH is able to cluster million scale datasets on a single node equipped with one GTX 480 GPU and achieve 2.0~4.5times performance gains compared with the MPI parallel spectral clustering implementation PSC using 16 processes for 4datasets.
关 键 词:谱聚类 T近邻 CPU-GPU异构计算 计算统一设备架构 OpenMP
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43