MPI-RCDD: A Framework for MPI Runtime Communication Deadlock Detection  被引量:1

在线阅读下载全文

作  者:Hong-Mei Wei Jian Gao Peng Qing Kang Yu Yan-Fei Fang Ming-Lu Li 

机构地区:[1]Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China [2]Jiangnan Institute of Computing Technology,Wuxi 214083,China

出  处:《Journal of Computer Science & Technology》2020年第2期395-411,共17页计算机科学技术学报(英文版)

基  金:This work was supported by the National Key Research and Development Program of China under Grant No. 2017YFB0202003。

摘  要:The message passing interface (MPI) has become a de facto standard for programming models of highperformance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPIRCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.

关 键 词:high-performance computing message PASSING interface(MPI) COMMUNICATION DEADLOCK DEADLOCK detection AND⊕OR model 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象