检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王晓东 赵一宁[1] 肖海力[1] 王小宁[1] 迟学斌[1,2] WANG Xiaodong;ZHAO Yining;XIAO Haili;WANG Xiaoning;CHI Xuebin(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049
出 处:《计算机科学与探索》2022年第10期2264-2272,共9页Journal of Frontiers of Computer Science and Technology
基 金:中国科学院战略性先导科技专项项目(A类)(XDA19020101)。
摘 要:日志分析对于计算机系统的稳定运行起着至关重要的作用,然而日志通常是非结构化的,不利于自动化分析,如何自动化将日志的模式提炼出来并变成结构化的数据具有重要的实际意义。提出了LDmatch算法,该算法以单词匹配率为基础实现了一种日志模式提炼算法。传统的日志匹配算法在进行相似度计算时使用一对一单词匹配法,而LDmatch算法根据两条日志所包含的单词之间的最长公共子序列计算日志之间的相似度,并以此为基础进行日志分类。LDmatch算法还能实时得到日志模板并更新。除此之外,该算法的模式仓库使用了基于哈希表的数据结构进行存储,该存储结构细化了日志的分类,减少了日志匹配时的比较次数,从而提高了日志模式提炼算法的匹配效率。为了验证算法的优势,将LDmatch算法应用于开源数据集以及国家高性能计算环境实际产生的日志数据集,并且使用多种其他日志模式提炼算法进行对比并得出实验结果,最终证明了该算法在准确度、鲁棒性和效率上具有优势。Log analysis plays an important role in the stable operation of computer system.However,logs are usua lly unstructured,which is not conducive to automatic analysis.How to categorize logs and turn them into structured data automatically is of great practical significance.In this paper,LDmatch algorithm is proposed,which imple ments a log pattern extracting algorithm based on word matching rate.Traditional log matching algorithms use one to-one word matching method in similarity calculation,while the proposed LDmatch algorithm calculates the simi larity between logs according to the longest common subsequence(LCS)of words contained in two logs,and classi fies logs based on the LCS.LDmatch algorithm can also get real-time log template and update.In addition,the pat tern warehouse of the algorithm uses a data structure based on hash table for storage,which refines the classification of logs and reduces the times of comparison during log matching,thus improving the matching efficiency of the algorithm.In order to verify the advantages of the algorithm,it is applied to the open source data set and the actual log data set generated by the CNGrid.A variety of other log pattern extraction algorithms are used for comparison and experimental results are obtained.Finally,the advantages of the algorithm in accuracy,robustness and efficiency are proven.
分 类 号:TP316[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.223.25