机构地区:[1]National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410000, China [2]Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen 518000, China [3]Computing Platform, Alibaba Cloud Computing Company, Hangzhou 310000, China
出 处:《Science China(Information Sciences)》2012年第12期2757-2773,共17页中国科学(信息科学)(英文版)
基 金:supported by National Basic Research Program of China(Grant No.2011CB302600);National High Technology Research and Development Program of China(Grant No.2012AA011201);National Natural Science Foundation of China(Grant Nos.61161160565,90818028,91118008,60903043);supported by National Natural Science Foundation of China(Grant No.61100077);Basic Research Program of Shenzhen(Grant No.JC201104220300A);Research Grants Council of Hong Kong(Project No.N_CUHK405/11)
摘 要:It is hard to localize the primary cause of performance anomalies in cloud computing systems because of the complexity of interactions between components. The hidden connections in the huge number of request execution paths in such systems usually contain useful information for diagnosing performance anomalies. We propose an approach to localize anomalous invoked methods and their physical locations by leveraging request trace logs, which involves two steps: (1) firstly, cluster the requests according to their corresponding call sequences, identify anomalous requests with principal component analysis, and then pick out anomalous methods with Mann-Whitney hypothesis test; (2) secondly, compare the behavior similarities of all replicated instances of the anomalous methods with Jensen-Shannon divergence, and select the ones whose behaviors are different from those of others, which will be chosen as the final culprits of performance anomalies. We conduct experiments with four real-world cases to validate our approach in Alibaba Cloud Computing Inc. The results demonstrate that our approach can locate the prime causes of performance anomalies with the low false-positive rate and false-negative rate.It is hard to localize the primary cause of performance anomalies in cloud computing systems because of the complexity of interactions between components. The hidden connections in the huge number of request execution paths in such systems usually contain useful information for diagnosing performance anomalies. We propose an approach to localize anomalous invoked methods and their physical locations by leveraging request trace logs, which involves two steps: (1) firstly, cluster the requests according to their corresponding call sequences, identify anomalous requests with principal component analysis, and then pick out anomalous methods with Mann-Whitney hypothesis test; (2) secondly, compare the behavior similarities of all replicated instances of the anomalous methods with Jensen-Shannon divergence, and select the ones whose behaviors are different from those of others, which will be chosen as the final culprits of performance anomalies. We conduct experiments with four real-world cases to validate our approach in Alibaba Cloud Computing Inc. The results demonstrate that our approach can locate the prime causes of performance anomalies with the low false-positive rate and false-negative rate.
关 键 词:cloud computing systems performance anomalies request trace logs fault localization
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...