基于词模式规则的轻量级日志模板提取方法  

Lightweight log template extraction method based on word pattern rules

在线阅读下载全文

作  者:顾兆军[1] 张智凯 刘春波 叶经纬 GU Zhaojun;ZHANG Zhikai;LIU Chunbo;YE Jingwei(Information Security Evaluation Center,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学信息安全测评中心,天津300300 [2]中国民航大学计算机科学与技术学院,天津300300

出  处:《现代电子技术》2024年第21期156-164,共9页Modern Electronics Technique

基  金:中国民航大学联合基金项目(U2333201)。

摘  要:传统基于规则的日志解析方法针对每类日志需单独编写规则,且随着系统更新,出现新的日志模式时,需人工再次干预;基于深度学习的日志解析方法虽准确率高,但计算复杂度高。为解决日志解析方法人力成本和计算复杂度高的问题,文中提出一种基于词模式规则的轻量级日志模板提取方法,该方法由初始规则集生成、词模式规则应用、潜在错误样本发掘三个部分构成。首先,原始日志基于自适应随机抽样获取彼此间相似度较低的代表性日志;然后,基于专家反馈提取初始词模式规则集,在词模式规则应用模块对原始日志进行处理并提取日志模板;最后,在潜在错误样本发掘模块检查生成的日志模板聚类,发现潜在的错误分类样本并对其进行规则集更新。经过实验验证,在16个公开日志数据集上,文中方法的平均准确度达到97.8%,与基于深度学习的日志解析算法准确度基本持平;在计算效率方面,文中方法的单线程解析速度达到每秒20000条,且随着可用内核数量的增加,性能持续提升,满足系统日志的故障诊断和安全分析需求。In the traditional rule-based log parsing methods,writing of separate rules for each type of logs is required.With the update of the system,new log patterns appear,and the manual intervention is needed again.The log parsing methods based on deep learning,however,have high accuracy but high computational complexity.In view of the above,a lightweight log template extraction method based on word pattern rules is proposed to reduce the high human cost and computational complexity.This method consists of three parts,including initial rule set generation,word pattern rule application,and potential error sample discovery.On the basis of the adaptive random sampling,the original logs obtain representative logs with low similarity among each other.Then the initial word pattern rule set is extracted based on expert feedback.The original logs are processed and log templates are extracted in the word pattern rule application module.The generated log templates are examined for clustering in the potential error sample discovery module,so as to find out the potential misclassified samples and update the rule set.After experimental verification,the average accuracy of the proposed method reaches 97.8%on 16 public log datasets.The average accuracy is basically the same as that of the log parsing algorithms based on deep learning.In terms of computational efficiency,the single-threaded parsing speed of the proposed method reaches 20000 entries per second,and its performance continues to improve with the increase of the number of available cores.To sum up,the proposed method can meet the demand of fault diagnosis and safety analysis of system logs.

关 键 词:日志解析 模板提取 词模式规则 正则匹配 启发式策略 规则集 

分 类 号:TN911-34[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象