Beacon^(+):面向E级超级计算机的轻量级端到端I/O性能监控与分析诊断系统  被引量:2

Beacon+:A scalable lightweight end-to-end I/O performance monitoring,analysis and diagnosis system for exascale supercomputers

在线阅读下载全文

作  者:杨斌 王敬宇 刘世超 邵明山 肖伟 陈起 何晓斌 刘卫国 薛巍 YANG Bin;WANG Jing-yu;LIU Shi-chao;SHAO Ming-shan;XIAO Wei;Chen Qi;HE Xiao-bin;LIU Wei-guo;XUE Wei(School of Software,Shandong University,Jinan 250101;National Supercomputing Center in Wuxi,Wuxi 214072;National Research Center of Parallel Computer Engineering&Technology,Beijing 100080;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)

机构地区:[1]山东大学软件学院,山东济南250101 [2]国家超级计算无锡中心,江苏无锡214072 [3]国家并行计算机工程技术研究中心,北京100080 [4]清华大学计算机科学与技术系,北京100084

出  处:《计算机工程与科学》2022年第9期1521-1531,共11页Computer Engineering & Science

基  金:国家重点研发计划(2020YFA0607900)。

摘  要:随着E级计算的屏障被打破,高性能计算已经迈入了新时代。为了满足日益增长的数据访问需求,新兴的技术和存储介质都被运用到了超级计算机中,这使得其架构变得日趋复杂,其性能异常和系统热点定位也变得十分困难。为此,设计并实现了一个面向E级超级计算机的轻量级端到端I/O性能监控与分析诊断系统——Beacon^(+)。该系统无需修改应用代码/脚本即可对每个应用的数据访问过程进行全路径实时监控与分析。通过在线+离线的压缩方法和分布式缓存/存储等机制,Beacon^(+)在保证系统本身高扩展性、低开销的同时还可以持续稳定地提供I/O诊断服务。以神威新一代超级计算机为部署平台,通过I/O标准测试应用和实际应用证明了Beacon^(+)的低开销和高准确性,以及I/O诊断的高效性。With the barrier to exascale computing being broken,high performance computing has entered a new era.In order to meet the growing demand for data access,new technologies and storage media have been used in supercomputers,which makes its architecture increasingly complex and makes it difficult to locate abnormal performance and system hotspots.To this end,a scalable lightweight end-to-end I/O performance monitoring,analysis and diagnosis system for exascale supercomputers,Beacon+,is designed and implemented.It can monitor and analyze the data access process of each application in real-time without modifying the application code/script.Through online+offline compression methods and distributed caching/storage mechanisms,Beacon+ensures that the system itself is highly scalable and low-cost,and can continuously and stably provide I/O diagnostic services.Using Sunway new-generation supercomputer as the deployment platform,we have proved Beacon’s low overhead,high accuracy and high efficiency of I/O diagnostics through I/O standard test applications and real-world applications.

关 键 词:I/O监控 数据压缩 I/O诊断 异常检测 性能瓶颈优化 

分 类 号:TP306[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象