一种数据并行中的群通信优化策略  被引量:3

An Optimized Strategy for Collective Communication in Data Parallelism

在线阅读下载全文

作  者:王珏[1] 胡长军[1] 张纪林[1] 李建江[1] 

机构地区:[1]北京科技大学信息工程学院,北京100083

出  处:《计算机学报》2008年第2期318-328,共11页Chinese Journal of Computers

基  金:国家"八六三"高技术研究发展计划项目基金(2006AA01Z105);国家自然科学基金(60373008);教育部重点基金(106019)资助

摘  要:群通信是影响大规模数据并行系统效率的关键因素,其主要发生在程序不同阶段间的数组重分布与循环划分后的数组重映射这两种情况.在一次通信中显著影响群通信效率常被忽视的因素是消息冲突和消息长度的不一致.因为它们会导致进程间大量的空闲等待时间.然而以前的研究要么不能完全避免消息冲突,要么针对某些特殊情况.对此,提出了在数组分布为Block_Cyclic(k)情况下的一种更具有普遍适用性的通信调度策略CSS.通过证明表明该策略能使一个通信步内的消息互不冲突且消息长度尽量相等.从而最小化通信调度生成时间和实际通信时间.最后的测试结果也表明,与传统的通信优化算法和MPI_Alltoallv实现相比,CSS策略使得通信效率得以明显提高.Collective communication significantly influences the performance of data parallel applications. It is required often in two situations: One is array redistribution from phase to phase another is data remapping after loop partition. Nevertheless, an important factor that influences the efficiency of collective communication is often neglected: When there is node contention and difference among message lengths during one particular communication step, a larger communication idle time may occur. In previous works, researchers can't completely avoid communication conflict and focus on some special cases. This paper is devoted to develop an universal and efficient communication scheduling strategy (CSS) concerning with the situation where array distributions are Block_Cyclic(k). Base on the proof for the recursive theorem of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that the strategy not on- ly avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv.

关 键 词:并行编译 数据并行 组通信 数组重分布 分布内存 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象