检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李颖颖[1,2] 奚慧兴 高伟[1,2] 李伟[4] 翟胜伟 Li Yingying;Xi Huixing;Gao Wei;Li Wei;Zhai Shengwei(Information Engineering University,Zhengzhou 450002,China;State Key Laboratory of Mathematical Engineering&Advanced Computing,Zhengzhou 450002,China;Anshan Normal University,Anshan Liaoning 114007,China;The 27th Research Institute,China Electronics Technology Group Corporation,Zhengzhou 450047,China)
机构地区:[1]信息工程大学,郑州450002 [2]数学工程与先进计算国家重点实验室,郑州450002 [3]鞍山师范学院,辽宁鞍山114007 [4]中国电子科技集团公司第二十七研究所,郑州450047
出 处:《计算机应用研究》2018年第9期2578-2582,共5页Application Research of Computers
基 金:国家自然科学基金资助项目(61472447);国家"863"计划资助项目(2014AA01A300);国家"核高基"重大专项资助项目(2013ZX0102-8001-001-001)
摘 要:作为多媒体和科学计算等领域重要的程序加速器件之一,SIMD扩展部件现已广泛集成于各类处理器中。自动向量化方法是目前生成SIMD向量化程序的重要手段。超字并行SLP(superword level parallelism)方法现已广泛应用于编译器中,并成为实现基本块级代码向量化的主要手段。SLP在进行收益评估时仅考虑代码段整体向量化的收益,并没有考虑到向量化收益为负的片段会降低最终整体的向量化收益,从而导致SLP方法无法达到最好的向量化效果。基于此,提出了一种基于剪切的SLP向量化方法(throttling SLP,TSLP)。通过寻找最优的向量化子图,去除了向量化收益为负的代码段,从而可以获得更好的向量化效果。通过标准测试程序的实验结果表明,与原来的SLP方法相比,TSLP方法平均能够获得9%的性能提升。SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for media and scientific applications.Compiler-based automatic vectorization is one approach for generating code that makes efficient use of the SIMD units.The SLP vectorization algorithm is the most well-known implementation of automatic vectorization.Choosing whether to vectorize is a one-off decision for the whole graph that has been generated.However,this is sub-optimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors.Therefore,this paper proposed a solution to overcome this limitation by introducing throttling SLP(TSLP),a novel vectorization algorithm that finds the optimal graph to vectorize.The decision did not consider the potential benefits of throttling the graph by removing this harmful code.The experiments show that TSLP can decrease execution time by 9%compared to SLP on average.
关 键 词:单指令多数据扩展部件 自动向量化 超字并行 代价模型
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.139.61.71