DCFT-Kernel:一种基于组服务的机群容错管理系统的设计与实现  被引量:2

DCFT-Kernel: A Fault-Tolerant Cluster Middleware Based on Group Service

在线阅读下载全文

作  者:黄伟[1] 詹剑锋[1] 樊建平[1] 

机构地区:[1]中国科学院计算技术研究所

出  处:《计算机研究与发展》2005年第6期993-999,共7页Journal of Computer Research and Development

基  金:国家"八六三"高技术研究发展计划重大专项基金项目(2002AA104410);国家"八六三"高技术研究发展计划软件重大专项基金项目(2002AA1Z2102)

摘  要:高可用和容错已经成为衡量机群系统(简称机群)的一个重要指标,随着机群的规模越来越庞大,如何实现大规模机群下的容错管理软件成为了技术难点.以传统分布式系统中的组通信技术为基础,采用将复杂的系统“分而治之”的思想,提出了组服务技术,可以解决容错管理软件的可扩展性和高可用性.同时,在组服务技术的基础上,结合实时的事件服务技术实现了一个大规模机群下的容错管理系统DCFTKernel,介绍了实现组服务和DCFTKernel的主要技术问题,并且对DCFTKernel的性能进行了分析.Being highly available and fault-tolerant is one of the most important factors that are used for evaluating cluster system. But with the scale of cluster system becoming more and more larger, how to implement system software for fault-tolerant management in cluster becomes a difficult technical problem. In this paper, the group services method is put forward to resolve the problem of high scalability and high availability when implementing fault-tolerant management software. The main idea of group services is to divide the cluster system into several small partitions and let every partition being fault-tolerant upon that the whole system can be fault-tolerant. Using group services technology together with real-time event service technology, the fault-tolerant management system software, named DCFT-Kernel, is implemented in the DAWNING-4000A cluster system. In this paper, emphasis is put on describing the group services technology, but an introduction to DCFT-Kernel is also provided. Furthermore. some performance evaluations are also given in the paper.

关 键 词:可扩展性 组服务 实时事件服务 容错管理 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象