基于位宽控制提高SIMD架构并行度的优化算法  被引量:5

Optimizing the SIMD Parallelism Through Bitwidth Analysis

在线阅读下载全文

作  者:张为华 朱嘉华[1] 张宏江[1] 臧斌宇[1] 

机构地区:[1]复旦大学并行处理研究所,上海200433

出  处:《计算机学报》2009年第11期2168-2177,共10页Chinese Journal of Computers

基  金:国家自然科学基金(60273046);博士点基金(20050246020)资助

摘  要:随着SIMD功能单元作为多媒体加速部件的广泛应用,如何有效利用这一构架优化应用程序成为编译优化研究的热点.目前典型的SIMD结构为同一操作对不同的数据位宽提供了不同的指令版本,随着操作数位宽的增加,对应的SIMD指令可同时完成的操作个数也随之降低.因此,如何有效识别操作数的有效位宽,对提高优化过程中SIMD指令内操作的并行度将产生至关重要的影响.文中针对SIMD优化面临的并行度问题,提出了一种优化算法,该算法在对操作数的有效位进行分析的基础上,进行溢出控制,从而减少操作数对宽位宽数据类型的依赖.实验数据表明,该算法可以有效提高多媒体程序优化的并行度,对多媒体程序获得较好的加速效果.Although the SIMD units have been widely used in different architecture designs, the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques, many special features of SIMD architectures, such as packed operations, have not been thoroughly considered. While operands are tightly packed within a register, there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs, the operands should be unpacked to preserve enough space for interim overflow. However, such a strategy would lead to great overhead. Moreover, the additional instructions for handling overflows can sometimes prevent other optimizations. In this paper, a new technique, BCSA (Bitwidth controlled SIMD arithmetic), is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. The algorithm is applied to the multimedia benchmarks of Berkeley. The experimental results show that the algorithm can significantly improve the per formance of multimedia applications.

关 键 词:有效位控制 溢出处理 饱和算术 编译优化 并行度 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象