基于LLVM Pass的复杂嵌套循环自动并行化框架  被引量:3

Automatic Parallelization Framework for Complex Nested Loops Based on LLVM Pass

在线阅读下载全文

作  者:马春燕[1] 吕炳旭 叶许姣 张雨[2] MA Chun-Yan;LÜBing-Xu;YE Xu-Jiao;ZHANG Yu(School of Software,Northwestern Polytechnical University,Xi’an 710129,China;School of Computer Science and Technology,Hainan University,Haikou 570228,China)

机构地区:[1]西北工业大学软件学院,陕西西安710129 [2]海南大学计算机科学与技术学院,海南海口570228

出  处:《软件学报》2023年第7期3022-3042,共21页Journal of Software

基  金:国家自然科学基金(62192733,62062030);航空基金(20185853038,201907053004)。

摘  要:随着多核处理器的普及应用,针对嵌入式遗留系统中串行代码的自动并行化方法是研究热点.其中,针对具有非完美嵌套结构、非仿射依赖关系特征的复杂嵌套循环的自动并行化方法存在技术挑战.提出了一种基于LLVMPass的复杂嵌套循环的自动并行化框架(CNLPF).首先,提出了一种复杂嵌套循环的表示模型,即循环结构树,并将嵌套循环的正则区域自动转换为循环结构树表示;然后,对循环结构树进行数据依赖分析,构建循环内和循环间的依赖关系;最后,基于OpenMP共享内存的编程模型生成并行的循环程序.针对SPEC2006数据集中包含近500个复杂嵌套循环的6个程序案例,分别对其进行复杂嵌套循环占比统计和并行性能加速测试.结果表明,提出的自动并行化框架可以处理LLVMPolly无法优化的复杂嵌套循环,增强了LLVM的并行编译优化能力,且该方法结合Polly的组合优化,比单独采用Polly优化的加速效果提升了9%-43%.With the popularization of multi-core processors,automatic parallelization of serial codes in embedded legacy systems is a research hotspot,while there are technical challenges in the automatic parallelization method for complex nested loops with imperfect nested structure and non-affine dependency characteristics.This study proposes an automatic parallelization framework(CNLPF)for complex nested loops based on LLVM Pass.Firstly,a representation model of complex nested loops,namely loop structure tree,is proposed,and the regular region of nested loops is automatically converted into a loop structure tree representation.Then,the data dependency analysis is carried out on the loop structure tree to construct intra-loop and inter-loop dependency relationship.Finally,the parallel loop program is generated based on the OpenMP shared memory programming model.For the 6 program cases in the SPEC2006 data set containing nearly 500 complex nested loops,the statistics of the proportion of complex nested loops and the parallel performance acceleration test were carried out respectively.The results show that the automatic parallelization framework proposed in this study can deal with complex nested loops that cannot be optimized by LLVM Polly,which enhances the parallel compilation and optimization capabilities of LLVM,and the method combined with Polly optimization improves the acceleration effect of Polly optimization alone by 9%−43%.

关 键 词:复杂嵌套循环 自动并行化 LLVM Pass 依赖分析 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象