出 处:《软件学报》2020年第4期1225-1239,共15页Journal of Software
基 金:国家重点研发计划(2018YFB0803600);国家自然科学基金(61702488,61732020)。
摘 要:在复杂网络理论中,core分解是一种最基本的度量网络节点“重要性”并分析核心子图的方法.Core分解广泛应用于社交网络的用户行为分析、复杂网络的可视化、大型软件的代码静态分析等应用.随着复杂网络的图数据规模和复杂性的增大,现有研究工作基于多核CPU环境设计core分解并行算法,由于CPU核数和内存带宽的局限性,已经无法满足大数据量的高性能计算需求,严重影响了复杂网络的分析应用.通用GPU提供了1万以上线程数的高并行计算能力和高于100GB/s访存带宽,已被广泛应用于大规模图数据的高效并行分析,如广度优先遍历和最短路径算法等.为了实现更为高效的core分解,提出面向GPU平台下的复杂网络core分解的两种并行策略.第1种RLCore策略基于图遍历思想,利用GPU高并发计算能力对网络图结构自底向上遍历,逐步迭代设置各节点所属的core层;第2种ESCore策略基于局部收敛思想,对各节点从邻居节点当前值进行汇聚计算更新直至收敛.ESCore相比RLCore能够大大降低遍历过程中GPU线程更新同一节点的同步操作开销,而其算法的迭代次数受收敛率的影响.在真实网络图数据上的实验结果表明,所提出的两个策略在效率和扩展性方面能够大幅优于现有其他方法,相比单线程上的算法高达33.6倍性能提升,且遍历边的吞吐性能(TEPS)达到406万条/s,单轮迭代的ESCore的执行效率高于RLCore.To analysis the complex networks,core decomposition is a basic and efficient strategy to distinguish the relative importance of nodes and to discover a special family of core subgraphs in networks.After core decomposition,every node in each k-core subgraph connects to other k neighbor nodes internally.The core decomposition has been widely applied in several application scenarios,e.g.,user behavior analysis in social networks,visualization of complex networks,static analysis in large system software project,etc.With the increasing scale and complexity of networks,existing works,which mostly focus on the multi-core CPU-based implementation of core decomposition,cannot satisfy the high performance of core decomposition in large-scale complex networks.Meanwhile,GPU provides not only massive parallelism degree(up to 10000 threads)but also efficient memory I/O bandwidth(approximately 100 GB/s),which makes it an excellent hardware platform for large graph structure analytic,such as BFS(breadth first search),SSSP(single source shortest path)algorithms.This study proposes two strategies to enhance the parallel performance of core decomposition on GPU-based platform An algorithm,RLCore,is first presented which exploits GPU-based bottom-up traverse approach and recursively distinguishes the core levels of nodes by considering their degree and edges.Then,a second optimal algorithm is proposed to improve performance and scalability,namely ESCore,based on the locality property of core decomposition.In ESCore,nodes gather and update their core level values from their neighbors,until there is no update among nodes.Compared to RLCore,ESCore strategy reduces the synchronization overhead from multi-thread contention when scaling to massive parallelism,whereas the iteration number of ESCore is depended on the convergence of nodes.From the evaluation results,two proposed acceleration algorithms achieve maximum 4.06 billion TEPS(traversed edges per second),which corresponds to up to 33.6X speedup compared to a single threaded CPU exe
关 键 词:复杂网络 GPU Core分解 大规模图数据 大数据处理
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...