检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:顾兆军[1] 侯晶雯 GU Zhaojun;HOU Jingwen(Information Security Evaluation Center,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science and Technology,Civil Aviation University,Tianjin 300300,China)
机构地区:[1]中国民航大学信息安全测评中心,天津300300 [2]中国民航大学计算机科学与技术学院科学院,天津300300
出 处:《小型微型计算机系统》2024年第2期374-380,共7页Journal of Chinese Computer Systems
基 金:民航安全能力建设基金项目(PESA2020100,PESA2021007,PESA2021009)资助。
摘 要:为了解决真实数据缺少类标签、日志解析错误影响模型性能的问题,设计了基于置信度的半监督异常检测模型SemiCAD.该模型首先基于原始日志数据进行特征提取;其次,通过基于分层密度的带噪声应用空间聚类(HDBSCAN)的正例无标记样本(PU)学习算法,对训练集中无标签的数据进行伪标签估计;最后,使用一致性预测中的统计量p值度量日志数据间的不一致性,选择多个合适的集成算法作为不一致性度量函数计算不一致得分进行协同检测,给出待测日志序列的标签及其标签置信度.在超级计算机(Blue Gene/L)和Hadoop分布式文件系统(HDFS)的日志数据上进行实验,结果表明,相比其他日志异常检测模型,该模型的召回率和F1值等均有所提升,证明该半监督模型在缺少标签的日志中可以有效检测异常.To solve the problem that real data lacks class labels and log parsing errors affect model performance,a semi-supervised anomaly detection model based on confidence degree is designed for SemiCAD.Firstly,feature extraction is performed based on original log data.Secondly,pseudo-label estimation is performed on the unlabeled data in the training set by using a positive example unlabeled sample(PU)learning algorithm based on hierarchical density applied spatial clustering(HDBSCAN)with noise.Finally,the statistical p-value in the conformal prediction is used to measure the inconsistencies between log data,and multiple appropriate integration algorithms are selected as the inconsistencies measurement functions to calculate the inconsistencies score for collaborative detection,and the labels of the log sequences to be tested and their tags confidence are given.Experiments are carried out on the log data of supercomputer(Blue Gene/L)and Hadoop distributed file system(HDFS).The results show that compared with other log anomaly detection models,the recall rate and F1 value of this model are improved,proving that this semi-supervised model can effectively detect anomalies in logs without labels.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222