互联网服务场景下基于机器学习的KPI异常检测综述  

Survey of Machine Learning-Based KPI Anomaly Detection on Internet-Based Services

在线阅读下载全文

作  者:尚书一 李宏佳[1] 宋晨[1] 卢至彤 王利明[1] 徐震[1] Shang Shuyi;Li Hongjia;Song Chen;Lu Zhitong;Wang Liming;Xu Zhen(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049)

机构地区:[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京100049

出  处:《计算机研究与发展》2025年第1期207-231,共25页Journal of Computer Research and Development

基  金:5G终端安全技术和管控技术研究项目(E3V1581)。

摘  要:关键性能指标(key performance indicator,KPI)异常检测技术是互联网服务智能运维的基础支撑技术.为了提升KPI异常检测的效率与准确性,基于机器学习的KPI异常检测技术成为近年来学术界与工业界的研究热点.在综合分析相关研究的基础上,给出了面向互联网服务的KPI异常检测技术框架.然后,分别针对单变量KPI、多变量KPI和矩阵变量KPI,从挖掘KPI在不同维度域(时间域、度量域、实体域)的依赖模式的角度出发,探讨了用于KPI异常检测的机器学习模型的选择动机.进一步地,以检测性能目标为导向,详细介绍了以准确性目标为核心的KPI异常检测技术(关注如何提升KPI异常检测模型的准确性)和以多目标平衡为核心的KPI异常检测技术(关注如何平衡理论性能与实际应用目标间的关系).最后,梳理了基于机器学习的KPI异常检测技术在KPI监控及预处理、模型通用性、模型可解释性、异常告警管理以及KPI异常检测任务自身局限性5个方面的挑战,同时指出了与之对应的潜在研究方向.Key performance indicator(KPI)anomaly detection is a fundamental technology for artificial intelligence for IT operations(AIOps)of Internet-based services.To improve the efficiency and accuracy of KPI anomaly detection,machine learning-based KPI anomaly detection has become a hotspot in both academia and industry recently.Through synthetically analyzing prior arts in this field,we first provide a technical framework of KPI anomaly detection for Internet-based services.Then,from the perspective of mining KPI’s dependency patterns in different domains(including time domain,metric domain and entity domain),we explore the motivation for model selection of KPI anomaly detection on three KPI types(including univariate KPI,multivariate KPIs and matrix-variate KPIs).Furthermore,guided by the detection performance objectives,we elaborate on KPI anomaly detection techniques from two perspectives:accuracy-centric anomaly detection techniques which focus on how to improve the accuracy of KPI anomaly detection models and multi-objective balancing-centric anomaly detection techniques which focus on how to balance theoretical performance with actual application objectives.Finally,we sort out five challenges on machine learning-based KPI anomaly detection,including KPI monitoring and KPI pre-processing,generality of the model,interpretability of the model,alarm management of anomalies,and limitations of KPI anomaly detection;and we also point out the corresponding potential research directions.

关 键 词:互联网服务 异常检测 关键性能指标 机器学习 智能运维 

分 类 号:TP274[自动化与计算机技术—检测技术与自动化装置] TP181[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象