基于核主成分分析的半监督日志异常检测模型  被引量:1

Anomaly detection model of semi-supervised log based on kernel principal component analysis

作  者:顾兆军[1] 叶经纬 刘春波 张智凯 王志 GU Zhaojun;YE Jingwei;LIU Chunbo;ZHANG Zhikai;WANG Zhi(Information Security Evaluation Center,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China;Hubei Sub-bureau of Middle South Air Traffic Management Bureau,CAAC,Wuhan,Hubei 432200,China;College of Cyber Science,Nankai University,Tianjin 300350,China)

机构地区:[1]中国民航大学信息安全测评中心,天津300300 [2]中国民航大学计算机科学与技术学院,天津300300 [3]中国民用航空中南地区空中交通管理局湖北分局,湖北武汉432200 [4]南开大学网络空间安全学院,天津300350

出  处:《江苏大学学报(自然科学版)》2025年第1期64-72,97,共10页Journal of Jiangsu University:Natural Science Edition

基  金:国家自然科学基金资助项目(U2333201,61872202);国家重点研发计划项目(2021YFF0603902);民航安全能力建设项目(PESA2020100,PESA2021007,PESA2021009);中国科学院重点部署项目(KFZD-SW-440);天津市自然科学基金资助项目(19JCYBJC15500)。

摘  要:对于具有“组异常”和“局部异常”分布特点的系统日志数据,传统的ADOA(anomaly detection with partially observed anomalies)半监督日志异常检测方法存在为无标签数据生成的伪标签准确性不佳的问题.针对此问题,提出一种改进的半监督日志异常检测模型.对已知异常样本采用k均值聚类,采用核主成分分析计算无标签样本的重构误差;运用重构误差和异常样本相似分计算出样本的综合异常分,作为其伪标签;依据伪标签计算LightGBM分类器的样本权重,训练异常检测模型.通过参数试验探究了训练集样本比例变化对模型性能的影响.在HDFS和BGL这2个公开数据集上进行试验,结果表明该模型能够提高伪标签的准确性,相较于DeepLog、LogAnomaly、LogCluster、PCA和PLELog等已有模型,精确率和F 1分数均有提升.与传统的ADOA异常检测方法相比,该模型F 1分数在2类数据集上分别提高了0.084和0.085.For the system log data with the distribution characteristics of"group anomaly"and"local anomaly",traditional semi-supervised log anomaly detection method of anomaly detection with partially observed anomalies(ADOA)has poor accuracy of pseudo-labels generated for unlabeled data.To solve the problem,the improved semi-supervised log anomaly detection model was proposed.The known abnormal samples were clustered by k-means,and the reconstruction errors of unlabeled samples were calculated by kernel principal component analysis.The comprehensive anomaly score of sample was calculated from reconstruction error and similarity to abnormal samples,which was used as pseudo-label.Sample weights for the LightGBM classifier were calculated based on pseudo-labels to train the anomaly detection model.The impact of the proportion of training set samples on model performance was explored through parameter experiments.The experiments were conducted on two public datasets of HDFS and BGL.The results show that the proposed model can improve the pseudo-label accuracy.Compared to existing models of DeepLog,LogAnomaly,LogCluster,PCA and PLELog,the precision and F 1 score are improved.Compared to traditional ADOA anomaly detection methods,F 1 scores are increased by 8.4%and 8.5%on the two datasets,respectively.

关 键 词:系统日志 日志异常检测 组异常 局部异常 半监督 重构误差 核主成分分析 伪标签 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象