机构地区:[1]西北大学信息科学与技术学院,西安710119 [2]西安邮电大学计算机学院,西安710121 [3]西安邮电大学陕西省网络数据智能处理重点实验室,西安710121
出 处:《计算机科学》2022年第12期125-135,共11页Computer Science
基 金:陕西省科技攻关(2016GY-123);陕西省重点研发项目(2020GY-210);河南省工业科学技术研究项目(212102210418);国家自然科学基金(61272286)。
摘 要:可靠性、可用性和安全性是软件质量度量的3个重要指标,而软件可靠性是软件质量最重要的指标。传统的软件可靠性评估将软件系统看作一个整体或者将软件系统调用结构视为静态结构。现今的软件结构发生了很多的改变,典型的有自主、协同、演进、动态和自适应等特征,已经渗入到当前的复杂网络结构软件系统中,传统的可靠性评估和预测方法已经不能适应当前复杂网络生态环境下的软件系统。在当前“软件定义一切”的高速信息化社会中,海量的信息系统产生了大规模的数据资源。现代信息系统的异构性、并行性、复杂性以及巨大的规模导致了日志资源的多样和复杂,基于系统日志的精准分析和故障预测对构建安全可靠的系统尤为重要。现有文献研究故障预测和软件可靠性的技术颇多,但是较少针对海量日志以及复杂构件进行软件即时可靠性度量。文中在系统分析日志解析、特征提取、故障检测、预测评估到即时可靠性计算的日志处理全过程中,使用集成学习模型对海量系统日志进行分析和故障预测,与传统的机器学习方法进行了比较,提高了故障预测的准确率、召回率和F1值;针对预测召回率低的情况,采用召回率修正即时可靠性的评估,较大程度地提高了即时可靠性的精度;根据个体的可靠性,通过基于马尔可夫理论的系统可靠性度量微服务复合构件的可靠性,从而为智能化运维提供精确的数据基础和故障定位依据。Reliability,usability and security are three important indicators of software quality measurement,and software reliability is the most important indicator.Software system is regarded as a whole or viewed invocation structure of software as static structure in traditional software reliability evaluation and prediction.Today’s software architecture has changed significantly.Typical features such as autonomy,coordination,evolution,dynamic and adaptive have been infiltrated into the current complex network software system.Traditional reliability evaluation and prediction methods cannot adapt to such software architecture or environment.Currently,in the society of high-speed information,“software defines everything”.Massive information systems ge-nerate large-scale data resources.The diversity and complexity of log resources are the results of heterogeneity,parallelism,complexity and huge scale of modern information systems.Accurate analysis and anomaly prediction based on logs are particularly important for building safe and reliable systems.There are a lot of research on anomaly prediction and software reliability in the existing literatures,but there is little about real-time software reliability measurement for massive logs and complex network component systems.Accordingly,based on the complete procedures of log processing,from its analysis,feature extraction,anomaly detection and prediction evaluation to real-time reliability evaluation,this paper uses ensemble learning model to analyze and predict anomaly of the massive system logs.Comparisons with the traditional machine learning methods are made to improve the accuracy,recall rate and F1 value of anomaly prediction.The evaluation result is used to correct the real-time reliability in view of the low predicted recall rate,which greatly improves the accuracy of real-time reliability.According to the individual reliability,the system reliability based on Markov theory is used to measure the reliability of microservice composite components,so as to provide a
关 键 词:日志解析 故障检测 可靠性评估 根因分析 集成学习 复杂构件
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...