一种面向异常传播的微服务故障诊断方法  被引量:3

Anomaly Propagation Based Fault Diagnosis for Microservices

在线阅读下载全文

作  者:王焘[1,2] 张树东[3] 李安 邵亚茹 张文博[1,2] WANG Tao;ZHANG Shu-dong;LI An;SHAO Ya-ru;ZHANG Wen-bo(Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory of Computer Sciences,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;Information Engineering College,Capital Normal University,Beijing 100048,China)

机构地区:[1]中国科学院软件研究所,北京100190 [2]中国科学院软件研究所计算机科学国家重点实验室,北京100190 [3]首都师范大学信息工程学院,北京100048

出  处:《计算机科学》2021年第12期8-16,共9页Computer Science

基  金:国家重点研发计划(2017YFB1400804);国家自然科学基金(61872344);北京市自然科学基金(4182070);中国科学院青年创新促进会人才专项(2018144)。

摘  要:微服务软件架构将大型复杂应用软件拆分成多个可独立部署的相互之间通过轻量级通信机制协作的微服务,从而实现了应用软件的敏捷开发和持续交付。然而,应用软件的微服务数量众多,调用关系复杂,当某个微服务出现故障时会引发与之交互的微服务也出现异常,从而大幅增加了软件应用出现故障的可能性。面对众多异常微服务,考虑到异常的传播性,如何高效、准确地定位引发异常的故障微服务,成为亟待解决的问题。针对该问题,文中提出一种面向异常传播的微服务故障诊断方法。首先,监测微服务度量信息与微服务之间的调用行为;然后,基于回归分析构建度量与API调用之间的回归模型以检测异常微服务;同时,构建微服务依赖图以刻画微服务间的异常传播;最后,基于服务依赖图以及异常服务集合得到故障传播子图,并基于PageRank算法找出最有可能引发异常的根因,即故障微服务。实验结果表明,该方法能够有效检测异常服务,准确诊断故障微服务,同时具有较低的开销。Microservice architectures separate a large-scale complex application into multiple independent microservices.These microservices with various technology stacks communicate with lightweight protocols to implement agile development and conti-nuous delivery.Since the application using a microservice architecture has a large number of microservices communicating with each other, the faulty microservice should cause other microservices interacting with the faulty one to appear anomalies.How to detect anomalous microservices and locate the root cause microservice has become one of the keys of ensuring the reliability of a microservice based application.To address the above issue, this paper proposes an anomaly propagation-based fault diagnosis approach for microservices by considering the propagation of faults.First, we monitor the interactions between microservices to construct a service dependency graph for characterizing anomaly propagation.Second, we construct a regression model between me-trics and API calls to detect anomalous services.Third, we get the fault propagation subgraph by combining the service dependency graph and the detected abnormal service.Finally, we calculate the anomaly degree of microservices with a PageRank algorithm to locate the most likely root cause of the fault.The experimental results show that our approach can locate faulty microservices with low overhead.

关 键 词:故障诊断 微服务 服务调用 度量关联 异常传播 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象