面向大规模集群的柔性配置更新推送方法  被引量:1

Flexible configuration update delivery for large clusters

在线阅读下载全文

作  者:唐震[1,2] 王伟 黄宇[4] 李艳林 纪树平[6] 宋傲 魏峻 黄涛 Zhen TANG;Wei WANG;Yu HUANG;Yanlin LI;Shuping JI;Ao SONG;Jun WEI;Tao HUANG(State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Software Technology,Chinese Academy of Sciences,Nanjing 211135,China;State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China;Alibaba Group,Hangzhou 311121,China;Electrical&Computer Engineering,University of Toronto,Toronto ON M5S 3G4,Canada)

机构地区:[1]计算机科学国家重点实验室(中国科学院软件研究所),北京100190 [2]中国科学院大学,北京100049 [3]中国科学院软件研究所南京软件技术研究院,南京211135 [4]计算机软件新技术国家重点实验室(南京大学),南京210023 [5]阿里巴巴集团,杭州311121 [6]Electrical&Computer Engineering,University of Toronto,Toronto ON M5S 3G4,Canada

出  处:《中国科学:信息科学》2020年第11期1645-1664,共20页Scientia Sinica(Informationis)

基  金:国家重点研发计划(批准号:2017YFB1001804);阿里巴巴创新研究计划(Alibaba Innovative Research,AIR)资助项目。

摘  要:配置管理是支撑云服务提供商管理大规模容器集群的重要基础设施.这一规模的集群通常包含百万量级的容器实例,如何根据不同业务场景的需求,及时可靠地将配置更新推送至对其感兴趣的容器实例,是亟待解决的关键问题.然而,现有方法仍存在不足之处.保障顺序一致性的共识算法限制了集群的扩展能力,难以适用于大规模集群.反熵算法存在长尾现象,时延难以保障,不适用于推送关键的配置更新.为了应对上述挑战,本文提出了一种面向大规模集群的柔性配置更新推送方法.这一方法基于发布/订阅机制,引入基于完全N叉树拓扑的可定义的多层次推送,并使用订阅者的部分计算资源协助推送,以提升推送性能;引入容错机制以应对节点失效和网络分区,保障网络分区时多分区读写可用.方法的拓扑参数和策略可根据业务场景对性能、可靠性等维度的不同需求而调整.实验结果表明,与现有的方法相比,我们的方法可以有效降低更新推送的时延,并可有效应对节点失效和网络分区场景.Efficient configuration management plays an important role in managing large-scale clusters in public cloud services.Because a cluster may contain millions of containers,it is challenging to guarantee that configuration updates will be delivered to interested containers reliably and in time to meet the requirements of different scenarios.Existing solutions to this problem have limitations:consensus algorithms limit the scalability of the cluster and may not work for large clusters;epidemic algorithms face the challenge of long-tail latency,which means the overall response time is too long to deliver critical configuration updates effectively.To overcome the limitations of existing solutions,in this paper,we present a novel flexible approach for delivering configuration updates for large-scale clusters.This pub/sub approach uses a configurable,complete N-ary tree as the overlay and introduces flexible,two-phase configuration update delivery.This method of update delivery uses a portion of subscribers’resources to improve its performance.Furthermore,it is fault-tolerant when it encounters node failures and network partitions.The strategies and the parameters of the overlay can be changed to meet performance and reliability requirements for different scenarios.Evaluations show that our approach significantly reduces the latency of update delivery compared to existing solutions.It also performs well in cases of node failures and network partitions.

关 键 词:更新推送 配置管理 完全N叉树 

分 类 号:TP393.0[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象