Kronos:towards bus contention-aware job scheduling in warehouse scale computers  被引量:1

sponsored by the National R&D Program of China(2018YFB1004800);the National Natural Science Foundation of China(Grant Nos.62022057,61632017,61832006);Alibaba Group.

在线阅读下载全文

作  者:Shuai XUE Shang ZHAO Quan CHEN Zhuo SONG Shanpei CHEN Tao MA Yong YANG Wenli ZHENG Minyi GUO 

机构地区:[1]Shanghai Jiao Tong University,Shanghai 200240,China [2]Alibaba Group,Hangzhou 311121,China

出  处:《Frontiers of Computer Science》2023年第1期1-14,共14页中国计算机科学前沿(英文版)

基  金:sponsored by the National R&D Program of China (2018YFB1004800), the National Natural Science Foundation of China (Grant Nos. 62022057, 61632017, 61832006) and Alibaba Group. Quan Chen and Minyi Guo are the corresponding authors. We thank Chao Qian for his collaborative effort during data collection. And we also thank anonymous reviewers provided helpful comments on earlier drafts of the manuscript.

摘  要:While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.

关 键 词:bus contention split lock SCHEDULE high performance cloud 

分 类 号:N12[自然科学总论] R54[医药卫生—心血管疾病]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象