检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:廖启华 聂凯 韩林 陈梦尧 谢汶兵 LIAO Qihua;NIE Kai;HAN Lin;CHEN Mengyao;XIE Wenbing(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450000,China;Wuxi Advanced Technology Research Institute,Wuxi,Jiangsu 214000,China)
机构地区:[1]郑州大学计算机与人工智能学院,郑州450000 [2]郑州大学国家超级计算郑州中心,郑州450000 [3]无锡先进技术研究院,江苏无锡214000
出 处:《计算机科学》2024年第12期100-109,共10页Computer Science
基 金:2022年河南省重大科技专项(221100210600);2022求是科研启动(自)(32213247)。
摘 要:现有的多面体编译框架(如Pluto,LLVM/Polly和GCC/Graphite)在进行循环分块时,都采用了固定分块大小,无法充分发挥不同硬件的缓存特性,导致存在较大的性能差异。针对这一问题,涌现了许多基于多级缓存和数据局部性的循环分块算法,但这些算法往往只能优化特定循环程序或者缺乏综合考虑,不适合移植到通用编译器中。文中提出了一种基于数据局部性的循环分块选择算法,该算法不仅考虑了缓存替换策略的影响,还考虑了多核环境下的负载均衡问题。算法基于LLVM中的Polly模块实现,并选用Pluto和PolyBench中的部分测试用例进行单核和多核测试。实验结果表明,单核环境下,相比LLVM/Polly的默认分块方法,该算法在两种硬件平台下分别获得了平均2.03和2.05的加速比,且在多核环境下具有良好的并行可扩展性。The existing polyhedral compilation frameworks(such as Pluto,LLVM/Poly and GCC/Graphite)use fixed block sizes when performing loop tiling,which cannot fully utilize the caching characteristics of different hardware,resulting in significant performance differences.In response to this issue,many loop tiling algorithms based on multi-level caching and data locality have emerged,but these algorithms often only optimize specific loop programs or lack comprehensive consideration,and are not suitable for transplantation into general compilers.This paper proposes a tile size selection algorithm based on data locality,which not only considers the impact of cache replacement strategy,but also considers the load balancing problem in multi-core environments.The algorithm is implemented based on the Polly module in LLVM,and some test cases from Pluto and PolyBench are selected for single core and multi-core testing.The experimental results show that compared to the default partitioning method of LLVM/Polly,the proposed algorithm achieves an average acceleration ratio of 2.03 and 2.05 on two hardware platforms in a single core environment,and has good parallel scalability in a multi-core environment.
关 键 词:数据局部性 多面体模型 循环分块 分块大小 负载均衡
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.186.60