检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于庆洋 白晓颖 李明杰 李奇原 刘涛 刘泽胤 裴丹[1,2] YU Qing-Yang;BAI Xiao-Ying;LI Ming-Jie;LI Qi-Yuan;LIU Tao;LIU Ze-Yin;PEI Dan(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;Beijing National Research Center for Information Science and Technology,Beijing 100084,China;Advanced Institute of Big Data,Beijing 100083,China;Department of Commercial Platform,Baidu Inc.,Beijing 100193,China)
机构地区:[1]清华大学计算机科学与技术系,北京100084 [2]北京信息科学与技术国家研究中心,北京100084 [3]北京大数据先进技术研究院,北京100083 [4]百度商业平台研发部,北京100193
出 处:《软件学报》2022年第5期1849-1864,共16页Journal of Software
基 金:国家重点研发计划(2019YFB1802504,2019YFE0105500);国家自然科学基金(62072264)。
摘 要:大型微服务系统中组件众多、依赖关系复杂,由于故障传播的涟漪效应,一个故障可能引起大规模服务异常,快速识别异常并定位根因是服务质量保证的关键.目前主要采用的调用链分析方法,常常面临调用链结构复杂、实例数量庞大、存在大量小样本等问题,因此提出基于调用链控制流分析,将大量调用链结构聚合为少量方法调用模型;并提出基于方法调用模型的执行时间分解模型及预测方法,将实际值与预测值的相对误差超过设定阈值的待检测数据判定为异常.采用百度凤巢广告业务系统某天超过17亿条调用链日志记录开展实验分析,结果表明:与数据驱动的调用序列分析方法相比,提出的基于模型的方法可以大幅缩减调用链结构数量,并有效分析和检测微服务性能异常及其根因.In a large microservice system,there usually exist many services with complex dependencies among them.A failure in one component may propagate widely and cause large-scale service anomalies.To ensure system quality,it is critical to effectively identify abnormalities and locate root causes.Invocation-chain analysis is a commonly used method for service performance modeling and anomaly detection.Existing techniques are mostly data-driven,facing many challenges of big data analysis such as diversified chain structures,a vast number of instances,and imbalanced datasets that many structures have only a small number of samples.In counter to the problems,the study proposes a model-based approach which builds high-level abstractions of method invocation models based on control-flow analysis.The instances of various invocation-chain structures are clustered into various method invocation models,which can greatly reduce the size of chain structures.Performance models are built for the method invocation models,and thresholds are defined based on the predicted execution time derived from the performance model.Outliers in the trace logs are thus identified as candidates of anomalies.Experiments were exercised on real industry logs from Baidu PhoenixNest Ads system.A one-day log with over 1.7 billion records was selected.The experiment results show that,compared with pure data-driven sequence analysis methods,the proposed model-based approach can greatly reduce the size of invocation-chain structures while effectively analyzing and detecting microservice performance anomalies and root causes.
关 键 词:微服务系统 性能异常检测 根因分析 调用链 控制流分析
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.33.135