一种基于剪切的SLP向量化方法  

SLP vectorization method based on throttling

在线阅读下载全文

作  者:李颖颖[1,2] 奚慧兴 高伟[1,2] 李伟[4] 翟胜伟 Li Yingying;Xi Huixing;Gao Wei;Li Wei;Zhai Shengwei(Information Engineering University,Zhengzhou 450002,China;State Key Laboratory of Mathematical Engineering&Advanced Computing,Zhengzhou 450002,China;Anshan Normal University,Anshan Liaoning 114007,China;The 27th Research Institute,China Electronics Technology Group Corporation,Zhengzhou 450047,China)

机构地区:[1]信息工程大学,郑州450002 [2]数学工程与先进计算国家重点实验室,郑州450002 [3]鞍山师范学院,辽宁鞍山114007 [4]中国电子科技集团公司第二十七研究所,郑州450047

出  处:《计算机应用研究》2018年第9期2578-2582,共5页Application Research of Computers

基  金:国家自然科学基金资助项目(61472447);国家"863"计划资助项目(2014AA01A300);国家"核高基"重大专项资助项目(2013ZX0102-8001-001-001)

摘  要:作为多媒体和科学计算等领域重要的程序加速器件之一,SIMD扩展部件现已广泛集成于各类处理器中。自动向量化方法是目前生成SIMD向量化程序的重要手段。超字并行SLP(superword level parallelism)方法现已广泛应用于编译器中,并成为实现基本块级代码向量化的主要手段。SLP在进行收益评估时仅考虑代码段整体向量化的收益,并没有考虑到向量化收益为负的片段会降低最终整体的向量化收益,从而导致SLP方法无法达到最好的向量化效果。基于此,提出了一种基于剪切的SLP向量化方法(throttling SLP,TSLP)。通过寻找最优的向量化子图,去除了向量化收益为负的代码段,从而可以获得更好的向量化效果。通过标准测试程序的实验结果表明,与原来的SLP方法相比,TSLP方法平均能够获得9%的性能提升。SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for media and scientific applications.Compiler-based automatic vectorization is one approach for generating code that makes efficient use of the SIMD units.The SLP vectorization algorithm is the most well-known implementation of automatic vectorization.Choosing whether to vectorize is a one-off decision for the whole graph that has been generated.However,this is sub-optimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors.Therefore,this paper proposed a solution to overcome this limitation by introducing throttling SLP(TSLP),a novel vectorization algorithm that finds the optimal graph to vectorize.The decision did not consider the potential benefits of throttling the graph by removing this harmful code.The experiments show that TSLP can decrease execution time by 9%compared to SLP on average.

关 键 词:单指令多数据扩展部件 自动向量化 超字并行 代价模型 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象