n维立方体:一种系统级运行时容错结构  

N-dimension Torus: A System-level Run-time Fault-tolerant Structure

在线阅读下载全文

作  者:雷鸣[1] 毛樟根 

机构地区:[1]江南计算技术研究所,无锡214000

出  处:《高性能计算技术》2013年第4期1-13,共13页

摘  要:本文研究一种多维Multi-Cluster容错管理结构。当前高性能计算机正迈向E级计算,基于n维环网同构MPP是重点方向。系统级管理需求日益复杂,既要注重高可扩展性,又要注重可持续容错。从软件角度看,运行时可分区是一个焦点,然而CAP三要素很难同时满足,但又必须要面对交叉维难题。本文探讨一种结构包容一致性思想,即在表示层感知一致性、可用性和分区容忍三要素;在中间层实现逻辑结构与底层物理结构映射和包容,并将其嵌入n维立方体(矩阵)可扩展管理结构中,以实现硬件保护可持续系统级容错。This paper analyzes a multi-dimension Multi-Cluster fault-tolerant management structure. Nowadays, supercomputer develops towards Exascale. Based on n-dimensional torus isomorphism, MPP is focused on system-level management needs of an increasingly complex. It is necessary to focus on the high scalability, but also pay attention to sustainable fault tolerance. From software of view, the runtime can partition is a focus, it is the CAP three elements can not meet, but it is essential to face the problem of crossdimensional. This paper discusses a structure inclusive consistency thinking, perception consistency, availability and partition tolerance of three elements in the presentation layer, the logical structure of the underlying physical structure mapping and inclusive in the middleware, and scalable embedded in the n-dimensional cube (matrix) management structure, in order to achieve hardware protection the sustainable system level fault-tolerant.

关 键 词:n维立方体 Multi—Cluster 同构MPP 系统级管理 可持续容错 

分 类 号:TP302.8[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象