检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:汤嘉武 郑龙[2] 廖小飞[3] 金海[4] Tang Jiawu;Zheng Long;Liao Xiaofei;Jin Hai(College of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074;National Engineering Research Center for Big Data Technology and System(Huazhong University of Science and Technology),Wuhan 430074;Key Laboratory of Services Computing Technology and System(Huazhong University of Science and Technology),Ministry of Education,Wuhan 430074;Key Laboratory of Cluster and Grid Computing(Huazhong University of Science and Technology),Wuhan 430074)
机构地区:[1]华中科技大学计算机科学与技术学院,武汉430074 [2]大数据技术与系统国家地方联合工程研究中心(华中科技大学),武汉430074 [3]服务计算技术与系统教育部重点实验室(华中科技大学),武汉430074 [4]集群与网格计算湖北省重点实验室(华中科技大学),武汉430074
出 处:《计算机研究与发展》2021年第3期467-478,共12页Journal of Computer Research and Development
基 金:国家重点研发计划项目(2018YFB1003502);国家自然科学基金项目(61702201,61825202,61832006)。
摘 要:图计算已成为大数据处理领域的主流应用,采用特定硬件加速可以显著提高图计算的性能和能效.众所周知,硬件代码的编写和验证十分耗时,尽管通用高层次综合(high level synthesis,HLS)系统允许用户使用高级语言(如C语言)特性自动生成硬件结构,但是对于图计算这种不规则算法,其仍缺乏有效的并行性和访存技术支撑,存在综合效果不理想、效率不高等突出问题.提出一种面向图计算的高效HLS方法,结合图算法嵌套循环、随机访存、数据冲突以及幂律分布等特性,采用数据流架构实现高效的并行流水线,保证处理单元的负载均衡.通过提供的编程原语,提出的方法可将通用图算法转化为模块化的数据流中间表示形式,进而映射到参数化的硬件模板.在Xilinx Virtex UltraScale+XCVU9P的实现验证了方法的正确性,不同类型的图算法在多个数据集上的实验结果表明,相比国际上通用的Spatial HLS系统,提出的方法可达到7.9~30.6倍的性能提升.Graph processing has become one of the mainstream big data applications.For graph applications such as biological networks,social networks,and Web graphs,traditional GPU and CPU architectures suffer in terms of power consumption and performance due to graph algorithms characteristics.It is demonstrated that specialized hardware acceleration can significantly promote the performance and energy-efficiency of graph processing.As we know,writing and verifying the correct hardware-level codes are tedious and time-consuming.Although general-purpose high level synthesis(HLS)systems allow users to write the applications using high-level languages such as C by automatically generating it into the underlying hardware codes.However,for the irregular graph applications,these HLS systems still lack effective support for massive parallelism and memory access,potentially leading to significantly low performance.In this paper,we propose an effective HLS for high-performance graph processing.We adopt the dataflow architecture to achieve efficient parallel pipelining,ensuring load balancing.Through the developed programming primitives,users can quickly customize the vertex-centric graph algorithm and translate it into a modular intermediate representation(IR),which in turn maps to a parameterized hardware template.We build our HLS on Xilinx Virtex UltraScale+XCVU9P.Results on a variety of graph algorithms and datasets show that our HLS system can outperform state-of-the-art spatial by 7.9-30.6x speedups.
关 键 词:图计算 高层次综合 数据流架构 中间表示 FPGA
分 类 号:TP302.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.195