基于企业微信的高性能集群监控管理系统  被引量:3

Monitoring Management System of High-Performance Computing Cluster Based on Enterprise-WeChat

在线阅读下载全文

作  者:冯伟[1] 姜远飞[1] FENG Wei;JIANG Yuanfei(Institute of Atomic and Molecular Physics,Jilin University,Changchun 130012,China)

机构地区:[1]吉林大学原子与分子物理研究所,长春130012

出  处:《吉林大学学报(信息科学版)》2023年第2期381-386,共6页Journal of Jilin University(Information Science Edition)

基  金:吉林大学2019实验技术基金资助项目(11974136)。

摘  要:为解决高性能集群监控管理中,系统异常监测受时间、地点限制,集群管理员无法及时发现集群异常从而影响集群系统正常运行等问题,利用企业微信的开放功能和消息传送机制,结合Linux(GNU/Linux)操作系统集群监控管理方法,开发了适合中小型集群的简单易用,并极易扩展的集群监控管理系统,实现了手机端预警信息呈现功能。阐述了系统需求、系统框架和功能设计、技术框架和数据流,以及系统部署与开发实现的具体过程。目前系统已开发完毕,应用于吉林大学原子与分子物理研究所的日常集群管理中。集群管理员和用户可以在不登录集群节点的情况下,通过手机端APP(Application)监控到集群系统的软硬件性能和作业完成状态,便于及时进行后续处理工作。尤其在疫情期间,居家办公,集群访问不便捷的情况下,该功能的实施辅助了吉林大学原子与分子物理研究所科研工作的高效进行。In order to solve the problems of high-performance cluster monitoring and management, such as system monitoring is restricted by time and place, which causes cluster administrators to be unable to find cluster abnormal situations in time and affects the running of the cluster system, the open function and message transmission mechanism of WeChat are used in combination with the cluster monitoring and management method of Linux(GNU/Linux) operating system, a kind of simple and easy-to-use cluster monitoring and management system is developed. It is suitable for small and medium-sized clusters with the ability to expand easily. We mainly expound the system requirements, system framework and function design, technical framework and data flow, as well as the specific process of system deployment and development. At present, the system has been developed and applied in the cluster monitoring management of the institute and molecular physics of Jilin University, and has achieved good application results. The cluster administrator and users can monitor the cluster performance and job completion status through APP(Application) on the mobile phone without login system, so as to facilitate the follow-up work in time. Especially during the COVID-19 period, when the cluster access is not convenient, the implementation of this function has assisted the efficient scientific research work of the institute.

关 键 词:企业微信 高性能计算集群 性能监控 作业管理 消息传送 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象