面向大规模集群的自动化监控系统  被引量:10

An automated monitoring system for large-scale supercomputers

在线阅读下载全文

作  者:杨杰 曾凌波 彭运勇 蒋迁谦 杜量 YANG Jie;ZENG Ling-bo;PENG Yun-yong;JIANG Qian-qian;DU Liang(National Supercomputing Center in Guangzhou,Sun Yat-Sen University,Guangzhou 510000,China)

机构地区:[1]中山大学国家超级计算广州中心,广东广州510000

出  处:《计算机工程与科学》2020年第10期1801-1806,共6页Computer Engineering & Science

摘  要:大规模集群系统结点数量越来越多、内部结构越来越复杂,集群可用性、稳定性的压力也越来越大,为了解决大规模集群可用性、稳定性的问题以及系统管理和系统运维难度大的问题,实现了一套大规模集群自动化监控系统。该自动化监控系统部署在大规模集群系统上,通过收集集群各组件的监控数据,利用微服务的方式处理监控数据,实现对集群各组件的实时监控。he number of large-scale cluster system nodes is increasing,the internal structure is becoming more and more complex,and the pressure on cluster availability and stability is also increasing.In order to solve the problems of the availability and stability of large-scale clusters and the difficulty of system management,operation and maintenance,an automated monitoring system for large-scale clusters is realized.The automated monitoring system is deployed on a large-scale cluster system.By collecting monitoring data of each cluster component and using microservices to process the monitoring data,the real-time monitoring of the cluster components are realized.

关 键 词:大规模 集群 监控 微服务 

分 类 号:TP306[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象