基于机群操作系统的并行调试器  被引量:2

A Parallel Debugger Based on Cluster Operating System

在线阅读下载全文

作  者:鄢超[1] 刘淘英[2] 陈国良[1] 

机构地区:[1]中国科技大学计算机科学与技术系,合肥230027 [2]中国科学院计算技术研究所,北京100080

出  处:《计算机研究与发展》2004年第4期630-636,共7页Journal of Computer Research and Development

基  金:国家自然科学基金项目 (60 0 73 0 18);中国科学院计算技术研究所青年创新基金项目(2 0 0 162 80 6)

摘  要:并行调试工具的设计 ,是并行计算环境工具研究开发中的一个突出难点 介绍了一个在曙光 30 0 0上实现的并行调试器DCDB3 0 该调试器是未来曙光 4 0 0 0机群操作系统的一部分 ,是曙光 30 0 0上的第 1个可运行版本 ,采用典型的客户 /服务器模式 客户端的用户界面可将冗繁的调试信息与操作可视化 客户端可以远离提供服务的大型机 ,其远程通信依赖的是机群操作系统中的DRPC和任务管理 ,前者提供远程方法调用 ,后者使得客户端能够在服务器上启动相应的任务 DCDB3 0的服务器端负责处理调试任务和同客户端进行信息交互 DCDB3 0的功能具有可扩放性 ,使得可以在此平台上研究一些高级并行调试技术的实现 改进了已有的方式 ,实现了重放技术 。The design of a parallel debugger is indispensable and yet still challenging in developing tools for parallel environments This paper focuses on the design and implementation of an actual parallel debugger, DCDB3 0 (Dawning Cluster DeBugger), which has been realized on Dawning 3000 clusters as a part of the cluster operating system to be used on Dawning 4000 DCDB3 0 is of a typical client/server structure A friendly user interface is provided, which visualizes the tedious process of debugging The user interfaces, as clients, can be distributed far away from the server with the aid of DRPC (Dawning remote procedure call), which provides communications between the client end and the server end, and with the aid of the task management module, which makes it easy for the client end to execute programs on the server machine Both DRPC and task management module, like DCDB3 0, are parts of the cluster operating system The server end of DCDB3 0 deals with debugging processes, receiving debugging commands and sending results The scalability of DCDB3 0 is emphasized, which means that advanced parallel debugging techniques can be added Replay based on recording wildcard message senders are implemented and DSM debugging and other techniques are going to be realized Compared with the former versions, DCDB3 0 is more powerful and convenient to users

关 键 词:机群操作系统 DRPC 任务管理 重放 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象