高性能计算环境中日志模式提炼方法的研究  被引量:2

Research on Method of Log Pattern Extracting in High-Performance Computing Environment

在线阅读下载全文

作  者:王晓东 赵一宁[1] 肖海力[1] 王小宁[1] 迟学斌[1,2] WANG Xiaodong;ZHAO Yining;XIAO Haili;WANG Xiaoning;CHI Xuebin(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049

出  处:《计算机科学与探索》2022年第10期2264-2272,共9页Journal of Frontiers of Computer Science and Technology

基  金:中国科学院战略性先导科技专项项目(A类)(XDA19020101)。

摘  要:日志分析对于计算机系统的稳定运行起着至关重要的作用,然而日志通常是非结构化的,不利于自动化分析,如何自动化将日志的模式提炼出来并变成结构化的数据具有重要的实际意义。提出了LDmatch算法,该算法以单词匹配率为基础实现了一种日志模式提炼算法。传统的日志匹配算法在进行相似度计算时使用一对一单词匹配法,而LDmatch算法根据两条日志所包含的单词之间的最长公共子序列计算日志之间的相似度,并以此为基础进行日志分类。LDmatch算法还能实时得到日志模板并更新。除此之外,该算法的模式仓库使用了基于哈希表的数据结构进行存储,该存储结构细化了日志的分类,减少了日志匹配时的比较次数,从而提高了日志模式提炼算法的匹配效率。为了验证算法的优势,将LDmatch算法应用于开源数据集以及国家高性能计算环境实际产生的日志数据集,并且使用多种其他日志模式提炼算法进行对比并得出实验结果,最终证明了该算法在准确度、鲁棒性和效率上具有优势。Log analysis plays an important role in the stable operation of computer system.However,logs are usua lly unstructured,which is not conducive to automatic analysis.How to categorize logs and turn them into structured data automatically is of great practical significance.In this paper,LDmatch algorithm is proposed,which imple ments a log pattern extracting algorithm based on word matching rate.Traditional log matching algorithms use one to-one word matching method in similarity calculation,while the proposed LDmatch algorithm calculates the simi larity between logs according to the longest common subsequence(LCS)of words contained in two logs,and classi fies logs based on the LCS.LDmatch algorithm can also get real-time log template and update.In addition,the pat tern warehouse of the algorithm uses a data structure based on hash table for storage,which refines the classification of logs and reduces the times of comparison during log matching,thus improving the matching efficiency of the algorithm.In order to verify the advantages of the algorithm,it is applied to the open source data set and the actual log data set generated by the CNGrid.A variety of other log pattern extraction algorithms are used for comparison and experimental results are obtained.Finally,the advantages of the algorithm in accuracy,robustness and efficiency are proven.

关 键 词:日志模式提炼 单词匹配率 日志模板 哈希表 

分 类 号:TP316[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象