HTDcr:a job execution framework for high-throughput computing on supercomputers  

在线阅读下载全文

作  者:Jiazhi JIANG Dan HUANG Hu CHEN Yutong LU Xiangke LIAO 

机构地区:[1]School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China [2]School of Software Engineering,South China University of Technology,Guangzhou 510006,China

出  处:《Science China(Information Sciences)》2024年第1期65-81,共17页中国科学(信息科学)(英文版)

基  金:supported by National Key R&D Program of China(Grant No.2021YFB0301300);National Natural Science Foundation of China(Grant No.U1811461);Zhejiang Lab(Grant No.2021KC0AB04);Major Program of Guangdong Basic and Applied Research(Grant No.2019B030302002);Program for Guangdong Introducing Innovative and Entrepreneurial Teams(Grant No.2016ZT06D211)。

摘  要:High-throughput computing(HTC)is a computing paradigm that aims to accomplish jobs by easily breaking them into smaller,independent components.However,it requires a large amount of computing power for a long time.Most existing HTC frameworks are job-oriented without support for coscheduling with hardware architecture and task-level execution.Also,most of the frameworks reach a limited scale,and their usability needs further improvement.Herein,we present HTDcr,a job execution framework for the HTC on supercomputers.This study aims to improve the throughput,task dispatching,and usability of the framework.In detail,the throughput optimizations include a sophisticated designed task management system,a hierarchical scheduler,and the co-optimization of the task-scheduling strategy with the application and hardware characteristics.The optimizations for usability include a programable execution workflow,mechanisms for more robust and reliable service qualities,and a fine-grained resource allocation system for the colocation of multiple jobs.According to our evaluations,HTDcr can achieve outstanding scalability and high throughput on large-scale clusters for the HTC workload.We evaluate HTDcr with several microbenchmarks and real-world applications on Tianhe-2 and Sunway TaihuLight to demonstrate its effects on existing design mechanisms.For instance,the task scheduling for two real-world applications integrated with the application and hardware characteristics achieves 1.7×and 1.9×speedups over the basic task-scheduling strategy.

关 键 词:high-throughput computing SUPERCOMPUTER task scheduling MIDDLEWARE password guessing 

分 类 号:TP338.4[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象