基于区域平均执行时间和数据依赖信息的可能并行区域识别  被引量:1

Identifying Possibly Parallel regions Using Average Execution Time of Regions and Data Dependence Profiling

在线阅读下载全文

作  者:张超[1] 王蕾[1] 向晓娅[1] 冯晓兵[1] 

机构地区:[1]中国科学院计算技术研究所计算机系统结构重点实验室

出  处:《计算机学报》2008年第10期1745-1753,共9页Chinese Journal of Computers

基  金:国家“九七三”重点基础研究发展规划项目基金(2005CB321602)资助~~

摘  要:随着多核处理器逐渐成为处理器发展的新趋势,为了持续提高程序性能,必须并行执行应用程序.传统的自动并行技术能够很好地并行科学计算应用中的规则循环,但对于含有大量函数调用和指针引用的不规则程序,目前还不能有效地对其实施并行.针对这一现状,文中提出了基于区域平均执行时间和数据依赖信息的可能并行区域识别方法来对一些不规则程序实施高效并行,主要贡献如下:(1)自动识别程序中的多种并行性,不仅包括传统并行性分析中的循环迭代间的细粒度并行性,而且也包括传统并行性分析尚不能有效处理的循环体和函数调用点间的粗粒度并行性.对于程序中蕴含的众多并行性,文中基于区域平均执行时间实施收益分析来选择合适的并行区域实施并行;(2)自动识别可能并行区域间数据依赖关系的数量、类型以及导致数据依赖关系的程序变量.基于文中的分析结果,作者使用面向行为的投机并行系统(behavior oriented parallelism)对SPEC2006中的4个测试用例实现了并行化.并行化后的程序在Intel和AMD多核处理器上分别得到了300%和260%的平均性能加速.The current trend in processor architecture is a move toward multi-core processors. Parallel execution will be required to improve program performance continuously. Traditional automatic program parallelization typically works for regular loops in codes of scientific applications, but in general can not find enough parallelism from irregular programs, especially those that have many pointer references and function calls. This paper presents a method to identifying possibly parallel regions using average execution time of regions and data dependence profiling. The main contributions are as follows: (1) automatically identify possibly parallel regions (PPR) at various levels of granularity. The parallel regions are not only traditional fine-grained parallel regions (inter loop iterations), but also coarse-grained parallel regions (inter loop bodies and function call sites). It selects a set of potentially beneficial regions from all regions of a program using average execution time of regions; (2) automatically identify number and types of inter-region dependences, and find out program variables that cause these inter-region dependences. In this paper, the authors use Behavior Oriented Parallelism (BOP) to verify the correctness of program transformation. According to the analyses results, the authors parallelize four SPEC2006 test cases. And the parallelized programs show 300% and 260% speedup on Intel and AMD multi-core machines respectively.

关 键 词:可能并行区域 区域平均执行时间 数据依赖信息 投机并行 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象